Home »
Python »
Python Programs
How to one-hot-encode from a pandas column containing a list?
Learn how to one-hot-encode from a pandas column containing a list in Python?
Submitted by Pranit Sharma, on February 12, 2023
Pandas is a special tool that allows us to perform complex manipulations of data effectively and efficiently. Inside pandas, we mostly deal with a dataset in the form of DataFrame. DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data.
One-hot-encode from a pandas column containing a list
One hot encoding is a technique used in machine learning algorithms to convert categorical data like 1/0, yes/no, true/false, etc into binary values of 0 and 1.
Numpy arrays can be indexed based on the indices of the elements. Here, a NumPy array is converted into a one hot encode two-dimensional array. We can understand this process with the help of this pictorial representation.
A two-dimensional array is created whose number of rows is equal to the size of the array and whose number of columns is equal to the largest element of the original array added to 1.
Let us understand with the help of an example
Python code to one-hot-encode from a pandas column containing a list
# Import pandas
import pandas as pd
# Import numpy
import numpy as np
# Creating a dataframe
df = pd.DataFrame({'A': [2, 4,4], 'B': [2, 0,3], 'C':[['Apple', 'Orange', 'Banana'],
['Apple', 'Grape'],['Banana']]})
# Display original dataframe
print("Original DataFrame:\n",df,"\n")
# Using explode and explode method for one hot encode
res = df[['A', 'B']].join(pd.crosstab((s:=df['C'].explode()).index, s))
# Display result
print("Result:\n",res,"\n")
Output
The output of the above program is:
Python Pandas Programs »