Home » Python

Pandas DataFrame in Python

Python | Pandas DataFrame: In this tutorial, we are going to learn about the Pandas DataFrame with syntax, examples of creation DataFrame, indexing, accessing, etc.
Submitted by Sapna Deraje Radhakrishna, on December 24, 2019

Python | Pandas DataFrame

A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects.

Syntax:

    class pandas.DataFrame(
        data=None, 
        index=None, 
        columns=None, 
        dtype=None, 
        copy=False
        )

Example creation of DataFrame

import numpy as np
import pandas as pd
from numpy.random import randn

np.random.seed(101)

df = pd.DataFrame(randn(5,4), ['A','B','C','D','E'],['W','X','Y','Z'])
print(df)

Output

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509

In the above example, each of the columns is a series and the respective rows are the common index-labels.

In order to do indexing and selection, the approach followed is,

print(df['W'])
'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

print(type(df['W']))
'''
Output:
<class 'pandas.core.series.Series'>
'''

The above explains that dataframe is a bunch of series with common index-labels. Another approach to retrieve the series from the dataframe is following the SQL way (less preferred way),

print(df.W)

'''
Output:
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
'''

To get multiple columns from the dataframes

print(df[['W','X']])
'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''
print(df[list('W''X')])

'''
Output:
          W         X
A  2.706850  0.628133
B  0.651118 -0.319318
C -2.018168  0.740122
D  0.188695 -0.758872
E  0.190794  1.978757
'''

To create a new column in a dataframe

df['new'] = df['X']+df['Y']
print(df)

'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

To remove the column in a dataframe

# doesn't remove from the object df
df.drop('W', axis=1) 
print(df)
'''
Output:
          W         X         Y         Z       new
A  2.706850  0.628133  0.907969  0.503826  1.536102
B  0.651118 -0.319318 -0.848077  0.605965 -1.167395
C -2.018168  0.740122  0.528813 -0.589001  1.268936
D  0.188695 -0.758872 -0.933237  0.955057 -1.692109
E  0.190794  1.978757  2.605967  0.683509  4.584725
'''

df = df.drop('W', axis=1)
print(df)
'''
Output:
          X         Y         Z       new
A  0.628133  0.907969  0.503826  1.536102
B -0.319318 -0.848077  0.605965 -1.167395
C  0.740122  0.528813 -0.589001  1.268936
D -0.758872 -0.933237  0.955057 -1.692109
E  1.978757  2.605967  0.683509  4.584725
'''

# use inplace = True to retain the changes
df.drop('X', axis=1, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
E  2.605967  0.683509  4.584725
'''

To remove a row from the dataframe

df.drop('E', axis=0, inplace = True)
print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

To order to explain the reasoning behind the value 0 and 1 to axis, we have to know the shape of the dataframe

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''

print(df.shape)
'''
Output:
(4, 3)
'''

The return type of shape is a tuple, and in above example the 0th index of tuple (4) refers to number of rows and 1st index of tuple (3) refers to the number of columns and hence the value given to axis as 0 or 1 while deleting the row/column.

Selecting rows in a dataFrame

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# here the argument is the location based index
print(df.loc['B'])
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

# here the argument is the numerical based index of the row
print(df.iloc[1] )
'''
Output:
Y     -0.848077
Z      0.605965
new   -1.167395
Name: B, dtype: float64
'''

Selecting subsets of rows and columns

print(df)
'''
Output:
          Y         Z       new
A  0.907969  0.503826  1.536102
B -0.848077  0.605965 -1.167395
C  0.528813 -0.589001  1.268936
D -0.933237  0.955057 -1.692109
'''
# row, column
print(df.loc['C','Y'])
'''
Output: 0.5288134940893595
'''
# pass the list of rows and columns to get the subsets
print(df.loc[['B','C'],['Y','Z']])
'''
Output:
          Y         Z
B -0.848077  0.605965
C  0.528813 -0.589001
'''







Comments and Discussions

Ad: Are you a blogger? Join our Blogging forum.








Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing
» Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates


© https://www.includehelp.com some rights reserved.