Pandas Correlation Groupby

Here, we are going to learn how to find the correlation between some specific columns? By Pranit Sharma Last updated : October 05, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

Problem statement

Suppose that we are given a DataFrame with some columns like Id, value, name, and age, and we need to find the correlation between and then group by with ID.

Finding the correlation between some specific columns

Here, pandas.DataFrame.corr() method is useful but it only finds the correlation between all the columns.

So, for this purpose, we will first apply the groupby() method on the columns with the column we want to group and then we will apply pandas.DataFrame.corr() method to this groupby object.

The groupby() is a simple but very useful concept in pandas. By using groupby(), we can create grouping of certain values and perform some operations of those values.

The groupby() method split the object, apply some operations, and then combines them to create a group hence large amounts of data and computations can be performed on these groups.

Let us understand with the help of an example,

Python program to find the correlation between some specific columns

```# Importing pandas package
import pandas as pd

# Importing numpy package
import numpy as np

# Creating a dictionary
d = {
'ID':[1,1,1,2,2,2,3,3,3],
'value':[5,4,6,7,4,3,4,2,4],
'name':['A','B','C','D','E','F','G','H','I'],
'data':[6,5,4,7,6,5,3,2,6]
}

# Creating DataFrame
df = pd.DataFrame(d)

# Display dataframe
print('Original DataFrame:\n',df,'\n')

# Grouping and finding correlation
res = df.groupby('ID')[['value','data']].corr()

# Display result
print('Result:\n',res,'\n')
```

Output

The output of the above program is: