How to get topmost N records within each group of a Pandas DataFrame?

Given a Pandas DataFrame, we have to get topmost N records within each group. By Pranit Sharma Last updated : September 22, 2023

Rows in pandas are the different cell (column) values which are aligned horizontally and also provides uniformity. Each row can have same or different value. Rows are generally marked with the index number but in pandas we can also assign index name according to the needs.

Problem statement

Given a Pandas DataFrame, we have to get topmost N records within each group.

Getting topmost N records within each group of a Pandas DataFrame

For this purpose, we will first use pandas.DataFrame.groupby() and then we will select topmost N record by using the following piece of code with the groupby result,

head(N).reset_index(drop=True)

Let us understand with the help of an example,

Python program to get topmost N records within each group of a Pandas DataFrame

# Importing pandas package
import pandas as pd

# Create dictionary
d = {
    'a':['A','A','A','A','B','B'],
    'b':[1,2,1,2,1,2],
    'c':[12,10,16,20,14,10]
}

# Create DataFrame
df = pd.DataFrame(d)

# Display DataFrame
print("Created DataFrame:\n",df)

# Groupby function
result = df.groupby('a', as_index=False)

# Selecting 1st row of group by result
final = result.head(2).reset_index(drop=True)

# Display final result
print("Final result:\n",final)

Output

The output of the above program is:

Original DataFrame:

Example 1: Get topmost N records within each group

Topmost N records in groupby result:

Example 2: Get topmost N records within each group

Python Pandas Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.