How to calculate partial correlation in Python?

Python partial correlation calculation: In this tutorial, we will learn what is partial correlation, how to calculate it, and how to calculate the partial correlation in Python? By Shivang Yadav Last updated : September 03, 2023

What is partial correlation?

Partial correlation is a statistical measure that quantifies the relationship between two variables while controlling for the influence of one or more other variables. In other words, it assesses the degree of association or correlation between two variables while accounting for the effects of additional variables that may be confounding the relationship.

The partial correlation coefficient, often denoted as "r," indicates how much two variables are correlated after removing the shared variance explained by the other variables in the analysis. It helps researchers and analysts to isolate the specific relationship between the variables of interest while holding constant the potential impact of other factors.

Calculation of partial Correlation in Python

The partial correlation in Python is calculated using a built-in function partial_corr() which is present in the pingoiun package (It is an open-source statistical package that is written in Python3 and based mostly on Pandas and NumPy). The function returns a dataset with multiple values.

Syntax:

partial_corr(data, x, y, cover)

Where,

  • data is the data set for which the partial correlation is to be found.
  • x and y are the column names for the correlation.
  • cover is the covariate column name.

Let us understand with the help of an example,

Python program to calculate the partial correlation

import numpy as np
import pandas as pd
import pingouin as pg

data = {
    "currentGrade": [82, 88, 75, 74, 93, 97, 83, 90, 90, 80],
    "hours": [4, 3, 6, 5, 4, 5, 8, 7, 4, 6],
    "examScore": [88, 85, 76, 70, 92, 94, 89, 85, 90, 93],
}

dataframe = pd.DataFrame(data, columns=["currentGrade", "hours", "examScore"])
print(f"The dataset is {dataframe}")

partCorrCoeff = pg.partial_corr(data=df, x="hours", y="examScore", covar="currentGrade")
print(f"The partial correlation is {partCorr}")

Output

The dataset is    currentGrade  hours  examScore
0            82      4         88
1            88      3         85
2            75      6         76
3            74      5         70
4            93      4         92
5            97      5         94
6            83      8         89
7            90      7         85
8            90      4         90
9            80      6         93

The partial correlation is    n	    r	       CI95%	   r2	adj_r2	p-val	 BF10	power
pearson	10	0.191	[-0.5, 0.73]	0.036	-0.238	0.598	0.438	0.082

Python NumPy Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.