Pandas: pd.Series.isin() performance with set versus array

Learn, about the pd.Series.isin() performance with set versus array in Python Pandas? By Pranit Sharma Last updated : October 06, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

pd.Series.isin() performance with set versus array

An amazing fact about the series isin() method is that it uses O(1) look-up per element. A Cython prototype would be the best approach to understand how easily it can beat the fastest out-of-the-box solution.

Ocado in the case of a set if it has n elements and the series has m elements then the running time would be:

  • T(n,m)=T_preprocess(n)+m*T_lookup(n)

Now the question is what happens for or series isin() method with an array, if we skip the pre-processing and search in linear time we will get:

  • O(n*m) which is not acceptable.

The internal working can be understood with the help of a debugger or a profiler:

  • In a pre-processing step, a HashMap is created out of n elements from an array i.e., in running time O(n).
  • m lookups happen in O(1) each for O(m) in total in the constructed HashMap.
  • Which results in T(n,m)=O(m)+O(n)

The important point that we should remember is the elements of Array are integers and not the Python objects as compared to the original state hence we cannot use this set as it is but need to convert the set of python objects to a set of raw integers this conversion would also consume some time and space and results in O(m).

The basic idea which comes from the conclusion comparison is that:

  • if n<m: series isin() method should be taken because pre-processing is not that much costly.
  • If n>m: then we cannot choose the set directly and the conversion step should be taken.

Python Pandas Programs »

Comments and Discussions!

Load comments ↻

Copyright © 2024 All rights reserved.