How to calculate trimmed mean in Python?

By Shivang Yadav Last updated : November 22, 2023

Trimmed Mean

A statistical measure of central tendency is calculated by removing a specified percentage of the smallest and largest values from a dataset and then computing the mean (average) of the remaining values. Trimming is done to reduce the impact of outliers or extreme values on the calculated mean, making the measure more robust to extreme data points.

Steps/Algorithm

The steps to calculate trimmed mean in Python are:

  1. Sort the data in ascending order.
  2. The percentage of data to be trimmed from both ends is to be determined. This percentage is typically denoted by p. For instance, if 10% of data is to be trimmed from each end, p would be 10%.
  3. Calculate the number of data points to trim from each end. This can be done by multiplying p by the total number of data points and dividing by 100. Let's call this n.
  4. Now, remove the first n data points and the last n data points from the sorted dataset.
  5. Calculate the mean (average) of the remaining data points.

Trimmed means are useful when dealing with datasets that contain outliers or extreme values that can skew the traditional mean. By removing a specified portion of extreme values, the trimmed mean provides a more robust estimate of central tendency. The choice of the percentage to trim “p” depends on the specific characteristics of your data and the extent to which you want to reduce the influence of outliers. Common choices include 5%, 10%, and 20%, but the selection of p should be based on the context of your analysis and the nature of your data.

Calculating Trimmed Mean

To calculate trimmed mean, use the trim_mean() method of the scipy library.

Syntax

Below is the syntax of trim_mean() method -

trim_mean(data, fractionTrim)

Here,

  • data is the set of data whose mean needs to be trimmed. This data can be an array or multiple array.
  • fractionTrim is the fraction by which the mean is to be trimmed.

Example 1: Calculate trimmed mean of an array

# Python program to calculate trimmer mean 
# of an array

from scipy import stats

meanArray = [2, 15, 9, 10, 14, 18, 3, 13, 17, 11, 1, 8]
print(f"The values of the array are \n{meanArray}")

trimMean = stats.trim_mean(meanArray, 0.25)
print(f"The trimmed mean is \n{trimMean}")

Output

The values of the array are 
[2, 15, 9, 10, 14, 18, 3, 13, 17, 11, 1, 8]
The trimmed mean is 
10.833333333333334

The same method can be implemented on multiple array data structures. The syntax will intake multiple arrays instead of single one.

Example 2: Calculate trimmed mean of multiple arrays

# Python program to perform trimmed mean operation 
# on multiple arrays

from scipy import stats
import pandas as pd

boundaries = pd.DataFrame(
    {"fours": [5, 2, 3, 1, 9, 3, 1, 6], "sixes": [2, 1, 0, 0, 5, 1, 4, 2]}
)
print(f"The values of the array are \n{boundaries}")

trimMean = stats.trim_mean(boundaries[["fours", "sixes"]], 0.05)
print(f"The trimmed mean is \n{trimMean}")

Output

The values of the array are 
   fours  sixes
0      5      2
1      2      1
2      3      0
3      1      0
4      9      5
5      3      1
6      1      4
7      6      2
The trimmed mean is 
[3.75  1.875]

Python SciPy Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.