Home »
Python »
Python Programs
How to filter rows in pandas by regex?
Given a Pandas DataFrame, we have to filter rows by regex.
Submitted by Pranit Sharma, on June 02, 2022
Pandas is a special tool which allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structure in pandas. DataFrames consists of rows, columns and the data.
Here, we are going to learn how to filter rows in pandas using regex, regex or a regular expression is simply a group of characters or special characters which follows a particular pattern with the help of which we can search and filter pandas DataFrame rows.
Regex (Regular Expression):
A special format string used for searching and filtering in pandas DataFrame rows.
Example:
- 'K.*': It will filter all the records which starts with the letter 'K'.
- 'A.*': It will filter all the records which starts with the letter 'A'.
As the regex is defined, we have to use the following piece of code for filtering DataFrame rows:
dataframe.column_name.str.match(regex)
To work with pandas, we need to import pandas package first, below is the syntax:
import pandas as pd
Let us understand with the help of an example,
# Importing pandas package
import pandas as pd
# Importing numpy package
import numpy as np
# Creating a Dictionary
d = {
'State':['MP','UP','Bihar','HP','Rajasthan','Meghalaya','Haryana'],
'Capital':['Bhopal','Luckhnow','Patna','Shimla','Jaipur','Shillong','Chandigarh']
}
# Creating a DataFrame
df = pd.DataFrame(d)
# Display DataFrame
print("Created DataFrame:\n",df,"\n")
Output:
Now, use regex filtration to filter DataFrame rows.
# Defining regex
regex = 'M.*'
# Here 'M.* means all the record that starts with M'
# Filtering rows
result = df[df.State.str.match(regex)]
# Display result
print("Records that start with M:\n",result,"\n")
Output:
# Defining regex
regex = 'H.*'
# Here 'H.* means all the record that starts with H'
# Filtering rows
result = df[df.State.str.match(regex)]
# Display result
print("Records that start with H:\n",result,"\n")
Output:
Python Pandas Programs »