Stratified Sampling in Pandas

Python Pandas | Stratified Sampling: Learn, how to generate stratified samples of size n from a dataset?
By Pranit Sharma Last updated : September 17, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

Python Pandas | Stratified Sampling

Stratified random sampling is a method of sampling that involves the division of a population into smaller subgroups known as strata. To generate a stratified sample, we need to pass min when passing the number to the sample.

We can use the groupby() method and apply a lambda function on the grouped object to find the samples.

Let us understand with the help of an example,

Python program to demonstrate the example of stratified sampling in pandas

# Importing pandas package
import pandas as pd

# Creating two dictionaries
d1 ={'A':[1, 1, 1, 2, 2, 2, 2, 3, 4, 4],'B':[i for i in range(10)] }

# Creating DataFrame
df = pd.DataFrame(d1)

# Display the DataFrame
print("Original DataFrame:\n",df,"\n\n")

# Finding stratified samples
res = df.groupby('A', group_keys=False).apply(lambda x: x.sample(min(len(x), 2)))

# Display result


The output of the above program is:

Example: Stratified Sampling in Pandas

Python Pandas Programs »

Comments and Discussions!

Load comments ↻

Copyright © 2024 All rights reserved.