How to perform data binning in Python?

By Shivang Yadav Last updated : November 21, 2023

Data Binning

Data binning or discretization or bucketing is a data preprocessing technique where continuous data is divided into discrete bins or intervals. This process is useful for reducing the impact of small fluctuations in the data and can make it easier to analyze and visualize.

Data Binning in Python

Python programming language used in machine learning and AI. For this Python has added many libraries with methods to perform such tasks with efficiency. For performing data binning in Python, use the qcut() method present in the pandas library. The qcut() method converts the Discretize variable into equal-sized buckets based on rank or based on sample quantiles.

Syntax

pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates='raise')

Parameters

  • x: The array or dataframe used for binning
  • q: Number of Quantity
  • label: takes in an array which acts as a label for resulting bins. Values - array/ False.
  • retbinsbool: optional parameter, that states whether the function returns (bin, bool) or not. Can be useful if bins are given as a scalar.
  • precision: optional parameter, that states the precision at which to store and display the bins labels.
  • duplicates: optional parameter, states whether to raise ValueError or drop non-unique values when the bin edges are not unique.

Python program to perform data binning

In this program, we have a dataframe and we are performing the data binning using the pandas.qcut() method.

import pandas as pd

# creating a DataFrame
matchData = pd.DataFrame(
    {"runs": [4, 7, 12, 8, 50, 13, 100], "dots": [2, 15, 4, 7, 21, 18, 51]}
)

# perform data binning on points variable
matchData["points_bin"] = pd.qcut(matchData["runs"], q=3)

print("Binned Data is\n", matchData)

Output

The output of the above program is:

Binned Data is
    runs  dots     points_bin
0     4     2   (3.999, 8.0]
1     7    15   (3.999, 8.0]
2    12     4    (8.0, 13.0]
3     8     7   (3.999, 8.0]
4    50    21  (13.0, 100.0]
5    13    18    (8.0, 13.0]
6   100    51  (13.0, 100.0]

Data binning in Python using labels

We can add quantifiers i.e. the number of bins and provide them with a label for binning. For this, we need to add the q and label parameters with values.

Python program to perform data binning with labels

import pandas as pd

# creating a DataFrame
matchData = pd.DataFrame(
    {"runs": [4, 7, 12, 8, 50, 13, 100], "dots": [2, 15, 4, 7, 21, 18, 51]}
)

# perform data binning on points variable
matchData["points_bin"] = pd.qcut(
    matchData["runs"], q=[0, 0.2, 0.4, 0.6, 0.8, 1], labels=["A", "B", "C", "D", "E"]
)

print("Binned Data is\n", matchData)

Output

The output of the above program is:

Binned Data is
    runs  dots points_bin
0     4     2          A
1     7    15          A
2    12     4          C
3     8     7          B
4    50    21          E
5    13    18          D
6   100    51          E

Python Pandas Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.