Pandas DataFrame in Python (With Examples)

Python | Pandas DataFrame: In this tutorial, we are going to learn about the Pandas DataFrame with syntax, examples of creation DataFrame, indexing, accessing, etc. By Sapna Deraje Radhakrishna, on December 24, 2019

Python | Pandas DataFrame

A Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.

Syntax to Create a DataFrame

Consider the below given statement to create a Pandas DataFrame in Python:

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Example 1: Create a Pandas DataFrame

import numpy as np
import pandas as pd
from numpy.random import randn

np.random.seed(101)

df = pd.DataFrame(randn(5,4), ['A','B','C','D','E'],['W','X','Y','Z'])
print(df)

Output

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509

In the above example, each of the columns is a series and the respective rows are the common index-labels.

Example 2: Indexing and Selection in a DataFrame

In order to do indexing and selection, the approach followed is,

print(df['W'])
'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

print(type(df['W']))
'''
Output:
<class 'pandas.core.series.Series'>
'''

The above explains that dataframe is a bunch of series with common index-labels. Another approach to retrieve the series from the dataframe is following the SQL way (less preferred way),

print(df.W)

'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

Example 3: Getting Multiple Columns from a DataFrame

print(df[['W','X']])
'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''
print(df[list('W''X')])

'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''

Example 4: Create a New Columns in the DataFrame

df['new'] = df['X']+df['Y']
print(df)

'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

Example 5: Remove a Column from the DataFrame

# doesn't remove from the object df
df.drop('W', axis=1) 
print(df)
'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

df = df.drop('W', axis=1)
print(df)
'''
Output:
          X         Y         Z       new
A  0.628133  0.907969  0.503826  1.536102
B -0.319318 -0.848077  0.605965 -1.167395
C  0.740122  0.528813 -0.589001  1.268936
D -0.758872 -0.933237  0.955057 -1.692109
E  1.978757  2.605967  0.683509  4.584725
'''

# use inplace = True to retain the changes
df.drop('X', axis=1, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
E  2.605967  0.683509  4.584725
'''

Example 6: Remove a Row from the DataFrame

df.drop('E', axis=0, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

Example 7: Shape of the DataFrame

To order to explain the reasoning behind the value 0 and 1 to axis, we have to know the shape of the dataframe

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

print(df.shape)
'''
Output:
(4, 3)
'''

The return type of shape is a tuple, and in above example the 0^th index of tuple (4) refers to number of rows and 1^st index of tuple (3) refers to the number of columns and hence the value given to axis as 0 or 1 while deleting the row/column.

Example 8: Select Rows from a DataFrame

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# here the argument is the location based index
print(df.loc['B'])
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

# here the argument is the numerical based index of the row
print(df.iloc[1] )
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

Example 9: Select Subsets of Rows and Columns from a DataFrame

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# row, column
print(df.loc['C','Y'])
'''
Output: 0.5288134940893595
'''
# pass the list of rows and columns to get the subsets
print(df.loc[['B','C'],['Y','Z']])
'''
Output:
          Y         Z
B -0.848077  0.605965
C  0.528813 -0.589001
'''

Comments and Discussions!

Load comments ↻

Top MCQs

Top Programs/Examples