Pandas dataframe select row by max value in group

Learn, how to select a row in Pandas dataframe by maximum value in a group?
Submitted by Pranit Sharma, on November 24, 2022

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

Problem statement

Suppose we are given with a dataframe with multiple columns. We need to filter and return a single row for each value of a particular column only returning the row with the maximum of a groupby object. This groupby object would be created by grouping other particular columns of the data frame.

Select row by max value in group

To select row by max value in group, we will simply groupby the columns and use the idxmax() method this method returns the index labels.
Let us understand with the help of an example

Python program to select row by max value in group

# Importing pandas package
import pandas as pd

# Importing numpy package
import numpy as np

# Creating a dictionary
d = {
    'A':[1,2,3,4,5,6],
    'B':[3000,3000,6000,6000,1000,1000],
    'C':[200,np.nan,100,np.nan,500,np.nan]
}

# Creating a DataFrame
df = pd.DataFrame(d)

# Display DataFrame
print("Original DataFrame:\n",df,"\n")

# grouping and returning max of group
res = df.loc[df.reset_index().groupby(['A'])['B'].idxmax()]

# Display result
print("Result:\n",res)

Output

Example: Pandas dataframe select row by max value in group

Python Pandas Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.