Pandas DataFrame in Python (With Examples)

Python | Pandas DataFrame: In this tutorial, we are going to learn about the Pandas DataFrame with syntax, examples of creation DataFrame, indexing, accessing, etc. By Sapna Deraje Radhakrishna, on December 24, 2019

Python | Pandas DataFrame

A Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.

Syntax to Create a DataFrame

Consider the below given statement to create a Pandas DataFrame in Python:

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Example 1: Create a Pandas DataFrame

import numpy as np
import pandas as pd
from numpy.random import randn

np.random.seed(101)

df = pd.DataFrame(randn(5,4), ['A','B','C','D','E'],['W','X','Y','Z'])
print(df)

Output

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509

In the above example, each of the columns is a series and the respective rows are the common index-labels.

Example 2: Indexing and Selection in a DataFrame

In order to do indexing and selection, the approach followed is,

print(df['W'])
'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

print(type(df['W']))
'''
Output:
<class 'pandas.core.series.Series'>
'''

The above explains that dataframe is a bunch of series with common index-labels. Another approach to retrieve the series from the dataframe is following the SQL way (less preferred way),

print(df.W)

'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

Example 3: Getting Multiple Columns from a DataFrame

print(df[['W','X']])
'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''
print(df[list('W''X')])

'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''

Example 4: Create a New Columns in the DataFrame

df['new'] = df['X']+df['Y']
print(df)

'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

Example 5: Remove a Column from the DataFrame

# doesn't remove from the object df
df.drop('W', axis=1) 
print(df)
'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

df = df.drop('W', axis=1)
print(df)
'''
Output:
          X         Y         Z       new
A  0.628133  0.907969  0.503826  1.536102
B -0.319318 -0.848077  0.605965 -1.167395
C  0.740122  0.528813 -0.589001  1.268936
D -0.758872 -0.933237  0.955057 -1.692109
E  1.978757  2.605967  0.683509  4.584725
'''

# use inplace = True to retain the changes
df.drop('X', axis=1, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
E  2.605967  0.683509  4.584725
'''

Example 6: Remove a Row from the DataFrame

df.drop('E', axis=0, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

Example 7: Shape of the DataFrame

To order to explain the reasoning behind the value 0 and 1 to axis, we have to know the shape of the dataframe

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

print(df.shape)
'''
Output:
(4, 3)
'''

The return type of shape is a tuple, and in above example the 0th index of tuple (4) refers to number of rows and 1st index of tuple (3) refers to the number of columns and hence the value given to axis as 0 or 1 while deleting the row/column.

Example 8: Select Rows from a DataFrame

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# here the argument is the location based index
print(df.loc['B'])
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

# here the argument is the numerical based index of the row
print(df.iloc[1] )
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

Example 9: Select Subsets of Rows and Columns from a DataFrame

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# row, column
print(df.loc['C','Y'])
'''
Output: 0.5288134940893595
'''
# pass the list of rows and columns to get the subsets
print(df.loc[['B','C'],['Y','Z']])
'''
Output:
          Y         Z
B -0.848077  0.605965
C  0.528813 -0.589001
'''


Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.