ADVERTISEMENT
ADVERTISEMENT

How to filter rows in pandas by regex?

Given a Pandas DataFrame, we have to filter rows by regex.
Submitted by Pranit Sharma, on June 02, 2022

Pandas is a special tool which allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structure in pandas. DataFrames consists of rows, columns and the data.

Here, we are going to learn how to filter rows in pandas using regex, regex or a regular expression is simply a group of characters or special characters which follows a particular pattern with the help of which we can search and filter pandas DataFrame rows.

Regex (Regular Expression):

A special format string used for searching and filtering in pandas DataFrame rows.

Example:

  • 'K.*': It will filter all the records which starts with the letter 'K'.
  • 'A.*': It will filter all the records which starts with the letter 'A'.

As the regex is defined, we have to use the following piece of code for filtering DataFrame rows:

dataframe.column_name.str.match(regex)

To work with pandas, we need to import pandas package first, below is the syntax:

import pandas as pd

Let us understand with the help of an example,

# Importing pandas package
import pandas as pd

# Importing numpy package
import numpy as np

# Creating a Dictionary
d = {
    'State':['MP','UP','Bihar','HP','Rajasthan','Meghalaya','Haryana'],
    'Capital':['Bhopal','Luckhnow','Patna','Shimla','Jaipur','Shillong','Chandigarh']
}

# Creating a DataFrame
df = pd.DataFrame(d)

# Display DataFrame
print("Created DataFrame:\n",df,"\n")

Output:

Example 1: Filter rows by regex

Now, use regex filtration to filter DataFrame rows.

# Defining regex
regex = 'M.*'

# Here 'M.* means all the record that starts with M'

# Filtering rows
result = df[df.State.str.match(regex)]

# Display result
print("Records that start with M:\n",result,"\n")

Output:

Example 2: Filter rows by regex
# Defining regex
regex = 'H.*'

# Here 'H.* means all the record that starts with H'

# Filtering rows
result = df[df.State.str.match(regex)]

# Display result
print("Records that start with H:\n",result,"\n")

Output:

Example 3: Filter rows by regex

Python Pandas Programs »



ADVERTISEMENT
ADVERTISEMENT


Comments and Discussions!



ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT

Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing
» Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates

© https://www.includehelp.com some rights reserved.