×

Python Tutorial

Python Basics

Python I/O

Python Operators

Python Conditions & Controls

Python Functions

Python Strings

Python Modules

Python Lists

Python OOPs

Python Arrays

Python Dictionary

Python Sets

Python Tuples

Python Exception Handling

Python NumPy

Python Pandas

Python File Handling

Python WebSocket

Python GUI Programming

Python Image Processing

Python Miscellaneous

Python Practice

Python Programs

Python - Set difference for pandas

Learn, how to drop the duplicates and create a set difference for pandas dataframe? By Pranit Sharma Last updated : September 26, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

Sets are used to store multiple items which are heterogeneous. Just like list, tuple, and dictionary, the set is another built-in data type in python which is used to store elements. Elements inside a set are unique that is there is only 1 occurrence of each element inside a set.

It is believed that if we want to remove duplicates from any collection, the best way is to convert it into a set, but the set is a collection that is unordered, unchangeable*, and unindexed elements.

Set difference for pandas

To drop the duplicates and create a set difference for the pandas dataframe, we will access all the columns and their values and map them into a variable which we will convert into a set.

Let us understand with the help of an example,

Python program for set difference for pandas

# Importing pandas package
import pandas as pd

# Creating a dictionary
d = {
    'Name':["Sonu","Shyam","Sonu","Geeta"],
    'Age':[20,19,20,21]
}

# Creating a DataFrame
df = pd.DataFrame(d)

# Display original DataFrame
print("Original DataFrame:\n",df,"\n")

# Accessing all the columns and converting 
# them into set
set1 = set(df['Name'].values)
set2 = set(df['Age'].values)

# Display result
print("Unique Names:\n",set1,"\n")

print("Unique Ages:\n",set2,"\n")

# Creating DataFrame with these sets

set1 = list(set1)
set2 = list(set2)

df2 = pd.DataFrame({'Name':set1,'Age':set2})

# Print new dataframe
print("New DataFrame:\n",df2)

Output

The output of the above program is:

Example: Set difference for pandas

Python Pandas Programs »

Advertisement
Advertisement

Comments and Discussions!

Load comments ↻


Advertisement
Advertisement
Advertisement

Copyright © 2025 www.includehelp.com. All rights reserved.