Splitting dataframe into multiple dataframes based on column values and naming them with those values

Given a pandas dataframe, we have to split it into multiple dataframes based on column values and naming them with those values.
Submitted by Pranit Sharma, on November 16, 2022

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

Problem statement

We are given a DataFrame of some brand and its product in different regions, we need to split this dataframe in multiple dataframes based on the column region and we will use the column values to name the new dataframes.

Splitting dataframe into multiple dataframes

For this purpose, we will first apply groupby on the column region, and then we simply iterate through the group with a for a loop. The groupby() method is a simple but very useful concept in pandas. By using groupby(), we can create a grouping of certain values and perform some operations of those values. The groupby() method split the object, apply some operations, and then combines them to create a group hence large amounts of data and computations can be performed on these groups.

In each iteration, we will get the subset of the data frame that is the distributed data frame based on region.

Let us understand with the help of an example,

Python program to split dataframe into multiple dataframes based on column values and naming them with those values

# Importing pandas package
import pandas as pd

# Importing numpy package
import numpy as np

# Creating a dictionary
d= {
    'brand':['Nike','Nike','Nike','Puma','Puma','Puma','Reebok','Reebok','Reebok'],
    'Region':['A','B','C','A','B','C','A','B','C'],
    'product':['Tshirt','Shoes','Jacket','Tshirt','Shoes','Jacket','Tshirt','Shoes','Jacket']
}

# Creating DataFrame
df = pd.DataFrame(d)

# Display dataframe
print('Original DataFrame:\n',df,'\n')

i = 1
# Using groupby and splitting df
for region, df_region in df.groupby('Region'):
    print("Subset "+str(i)+"\n",df_region,"\n")
    i=i+1

Output

The output of the above program is:

Example: Splitting dataframe into multiple dataframes based on column values and naming them with those values

Python Pandas Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.