How to Center Data in Python (With Examples)

By Shivang Yadav Last updated : November 22, 2023

Centering data

Centering data involves subtracting a constant value from each data point in a dataset. This constant value is typically the mean (average) of the dataset. Centering data can be useful for various reasons, including simplifying interpretation and analysis, removing bias or constant terms, and preparing data for certain statistical techniques.

Steps/Algorithm

Let's see the steps to center dataset calculation:

  • Step 1: Mean Calculation - Calculate the mean value of the dataset. Mean (μ) of a dataset x containing n values.
    Mean (μ) = (Σx) / n
    
  • Step 2: Subtract mean from each value to extract the center data point.
    Centered Data Point = Data Point - Mean
    

Example

Let's see an example,

Data Set : [10, 15, 20, 25, 30]

Mean (μ) = (10 + 15 + 20 + 25 + 30) / 5 = 20
= Centered Data Point 1 = 10 - 20 = -10
= Centered Data Point 2 = 15 - 20 = -5
= Centered Data Point 3 = 20 - 20 = 0
= Centered Data Point 4 = 25 - 20 = 5
= Centered Data Point 5 = 30 - 20 = 10

The resulting centered data values are: [-10, -5, 0, 5, 10].

By entering the data in this way, you make the mean of the centered data equal to zero, and you remove the constant term, which can be helpful in various statistical analyses and interpretations.

Now, since we have cleared the basic logic of the center data. Now, create a program using the NumPy library to perform this calculation.

Python program to center the values of NumPy array

import numpy as np

# function to return the distance 
# from center for every data
center_function = lambda x: x - meanVal

# Creating nunpy array and printing the data
dataSet = np.array([10, 15, 20, 25, 30])
print(f"The value of the data set are \n{dataSet}")

# finding the mean value of the data
meanVal = dataSet.mean()

centerData = center_function(dataSet)
print(f"The array of centered data values is \n{centerData}")

Output

The value of the data set are 
[10 15 20 25 30]
The array of centered data values is 
[-10.  -5.   0.   5.  10.]

The same function can be used to calculate the center data for variables in a column of a Pandas DataFrame.

Python program to center the columns of a Pandas DataFrame

import pandas as pd

# create DataFrame
dataFr = pd.DataFrame(
    {
        "x": [10, 20, 23, 43, 56, 90],
        "y": [17, 45, 60, 77, 89, 100],
        "z": [3, 13, 13, 16, 18, 29],
    }
)

print(f"The value of the data set are \n{dataFr}")

centerData = dataFr.apply(lambda x: x - x.mean())

# view centered DataFrame
print(f"The value of the data set are \n{centerData}")

Output

The value of the data set are 
    x    y   z
0  10   17   3
1  20   45  13
2  23   60  13
3  43   77  16
4  56   89  18
5  90  100  29
The value of the data set are 
           x          y          z
0 -30.333333 -47.666667 -12.333333
1 -20.333333 -19.666667  -2.333333
2 -17.333333  -4.666667  -2.333333
3   2.666667  12.333333   0.666667
4  15.666667  24.333333   2.666667
5  49.666667  35.333333  13.666667

Python NumPy Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.