Home »
Python »
Python Programs
How to get a list of all the duplicate items using Pandas in Python?
Given a Pandas DataFrame, we have to get a list of all the duplicate items.
Submitted by Pranit Sharma, on June 04, 2022
Duplicity in a column of pandas DataFrame occurs when there is more than 1 occurrence of similar elements.
Here, we are going to learn how to get a list of duplicates in pandas DataFrame? For this purpose, we will use the pandas.DataFrame.duplicated() method having the following syntax:
DataFrame.duplicated(subset=None, keep='first')
The pandas.DataFrame.duplicated() method will iterate over entire DataFrame and check that if any element occurs more than 1 or not, if the occurrence is more than 1 for any element it will return that element, by passing 'keep=False' it will return the 1st occurrence of the elements.
To work with pandas, we need to import pandas package first, below is the syntax:
import pandas as pd
Let us understand with the help of an example,
# Importing pandas package
import pandas as pd
# Defining two DataFrames
df = pd.DataFrame(
data = {
'Products':['Frooti','Krack-jack','Hide&seek','Frooti','Chawanprash','Honey','Hair oil','Honey','Maggie','Kitkat','EveryDay','Crunch']
})
# Display DataFrame
print("Original DataFrame:\n",df,"\n")
Output:
Now, get all the duplicate items in the DataFrame,
# Getting the duplicates from parle column
result = df[df.duplicated(['Products'], keep='first')]
# Display result
print("Duplicates:\n",result)
Output:
Python Pandas Programs »