How to get a list of all the duplicate items using Pandas in Python?

Given a Pandas DataFrame, we have to get a list of all the duplicate items. By Pranit Sharma Last updated : September 22, 2023

Duplicity in a column of pandas DataFrame occurs when there is more than 1 occurrence of similar elements.

Getting a list of all the duplicate items using Pandas in Python

For this purpose, we will use the pandas.DataFrame.duplicated() method, this method will iterate over entire DataFrame and check that if any element occurs more than 1 or not, if the occurrence is more than 1 for any element it will return that element, by passing 'keep=False' it will return the 1st occurrence of the elements. The following is the syntax:

DataFrame.duplicated(subset=None, keep='first')
Note

To work with pandas, we need to import pandas package first, below is the syntax:

import pandas as pd

Let us understand with the help of an example,

Python program to get a list of all the duplicate items using Pandas

# Importing pandas package
import pandas as pd

# Defining two DataFrames
df = pd.DataFrame(
    data = {
        'Products':['Frooti','Krack-jack','Hide&seek','Frooti','Chawanprash','Honey','Hair oil','Honey','Maggie','Kitkat','EveryDay','Crunch']
    })

# Display DataFrame
print("Original DataFrame:\n",df,"\n")

# Getting the duplicates from parle column
result = df[df.duplicated(['Products'], keep='first')]

# Display result
print("Duplicates:\n",result)

Problem statement

The output of the above program is:

Example 1: Get a list of all the duplicate items
Example 2: Get a list of all the duplicate items

Python Pandas Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.