How to Calculate Z-Scores in Python?

Python Z-score computation: In this tutorial, we will learn about the Z-Score in statistics and how to write a Python program to calculate Z-score? By Shivang Yadav Last updated : September 03, 2023

Z-Score in Statistics

A Z-score, also known as a standard score, is a statistical measurement that quantifies the number of standard deviations a data point is from the mean (average) of a dataset. It is used to assess how far a particular data point is from the mean and helps in understanding the data's distribution and variability.

Mathematically, the formula to calculate the Z-score of a data point (x) in a dataset with mean (μ) and standard deviation (σ) is:

z-score formula

Calculating Z-Score in Python

To calculate the Z-score in Python, you can use the scipy.stats.zscore() method which is a library method of scipy.stats module. Consider the syntax of this method.

scipy.stats.zscore(a, axis=0, ddof=0, nan_policy='propagate')

Where,

  • a - is the array-like object containing the data
  • axis - is the axis along which to calculate the z-scores. The default is 0.
  • ddof - degrees of freedom correction in the calculation of the standard deviation. The default value is 0.
  • nan_policy - is used for error handling. Values propagate -> nan, raise -> throw error, omit-> ignore values.

Python program to calculate the Z-score

# Importing the libraries
import numpy as np
import scipy as scipy

# Sample dataset
dataset = np.array([1, 2, 3, 4, 5, 30, 6, 7, 8, 9, 10])
print(f"The dataset is {dataset}")

# Calculation of z-score for values...
z_score = scipy.stats.zscore(dataset, axis=0, ddof=0, nan_policy="propagate")
print(f"The z-score in the dataset is {z_score}")

The output of the above program is:

The dataset is [ 1  2  3  4  5 30  6  7  8  9 10]
The z-score in the dataset is [-0.89021047 -0.75788188 -0.6255533  -0.49322472 -0.36089614  2.94731844
 -0.22856755 -0.09623897  0.03608961  0.1684182   0.30074678]

Step by step working of the code

The code works directly by calculating the value using the above-mentioned formula provided in the built-in functions of the Python programming language.

The zscore() method directly returns the value. An alternate or more elaborate way would have been calculating the values of mean and standard deviation in Python using numpy library's methods and then putting them in the formula to return the resultant array.

Python SciPy Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.