Fast punctuation removal with pandas

Given a pandas dataframe, we have to remove punctuation marks from its column.
Submitted by Pranit Sharma, on February 11, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

The string is a group of characters, these characters may consist of all the lower case, upper case, and special characters present on the keyboard of a computer system. A string is a data type and the number of characters in a string is known as the length of the string.

Problem statement

Given a pandas dataframe, we have to remove punctuation marks from its column.

Removing punctuation marks from dataframe's column

For this purpose, we will use the str.replace() method with the DataFrame's column's name df['column_name']. It returns a copy of the string where occurrences of a substring are replaced with another substring. The punctuation is nothing but here, it is considered as a string only.

Let us understand with the help of an example,

Python program for fast punctuation removal with pandas

# Importing pandas package
import pandas as pd

# Import numpy
import numpy as np

# Creating a dataframe
df = pd.DataFrame({'col':['////@#$%A', '$#@B','*~~@$!!!C', ')(&^$D']})

# Display Original DataFrame
print(" Original DataFrame:\n",df,"\n")

# Removing punctuation
df['col'] = df['col'].str.replace(r'[^\w\s]+', '')

# Display result
print("Modified DataFrame:\n",df,"\n")

Output

The output of the above program is:

Example: Fast punctuation removal with pandasframe

Python Pandas Programs »

Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.