How to one-hot-encode from a pandas column containing a list?

Learn how to one-hot-encode from a pandas column containing a list in Python? Submitted by Pranit Sharma, on February 12, 2023

Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.

One-hot-encode from a pandas column containing a list

One hot encoding is a technique used in machine learning algorithms to convert categorical data like 1/0, yes/no, true/false, etc into binary values of 0 and 1.

Numpy arrays can be indexed based on the indices of the elements. Here, a NumPy array is converted into a one hot encode two-dimensional array. We can understand this process with the help of this pictorial representation.

one-hot-encode

A two-dimensional array is created whose number of rows is equal to the size of the array and whose number of columns is equal to the largest element of the original array added to 1.

Let us understand with the help of an example

Python code to one-hot-encode from a pandas column containing a list

# Import pandas
import pandas as pd

# Import numpy
import numpy as np

# Creating a dataframe
df = pd.DataFrame({'A': [2, 4,4], 'B': [2, 0,3], 'C':[['Apple', 'Orange', 'Banana'],
['Apple', 'Grape'],['Banana']]})

# Display original dataframe
print("Original DataFrame:\n",df,"\n")

# Using explode and explode method for one hot encode
res =  df[['A', 'B']].join(pd.crosstab((s:=df['C'].explode()).index, s))

# Display result
print("Result:\n",res,"\n")

Output

The output of the above program is:

Example: How to one-hot-encode from a pandas column containing a list?

Python Pandas Programs »


Comments and Discussions!

Load comments ↻






Copyright © 2024 www.includehelp.com. All rights reserved.