Home » Python

Python Pandas – Operations

Various operations in Python Pandas: In this tutorial, we are going to learn about some of the very useful operations with description and examples.
Submitted by Sapna Deraje Radhakrishna, on February 03, 2020

Pandas support very useful operations which are illustrated below,

Consider the below dataFrame,

import numpy as np
import pandas as pd

df = pd.DataFrame({
  'col1': [1, 2, 3, 4],
  'col2': [444, 555, 666, 444],
  'col3': ['abc', 'def', 'ghi', 'xyz']
})

print(df.head())

'''
Output:
   col1  col2 col3
0     1   444  abc
1     2   555  def
2     3   666  ghi
3     4   444  xyz
'''

Finding unique values in a data frame

In order to find unique values from columns,

# returns numpy array of all unique  values
print(df['col2'].unique() )
# Output: array([444, 555, 666])

# returns length / number of unique values 
# in a numpy array
print(df['col2'].nunique())
# Output: 3

# if we want the table of the unique values
# and how many times they show up
print(df['col2'].value_counts() )
'''
Output:
444    2
555    1
666    1
Name: col2, dtype: int64
'''

Selecting data from a data frame

Consider the dataFrame,

Using the conditional selection, we could select data as follows,

print(df['col1']>2)

'''
Output:
0    False
1    False
2     True
3     True
Name: col1, dtype: bool
'''

print(df[(df['col1']>2)])

'''
Output:
   col1  col2 col3
2     3   666  ghi
3     4   444  xyz
'''

print(df[df['col1']>2 & (df['col2']==44)])

'''
Output:
   col1  col2 col3
0     1   444  abc
1     2   555  def
2     3   666  ghi
3     4   444  xyz
'''

Applied Methods

Consider a simple method,

def times2(x):
  return x*2

We already are aware that we can grab a column and call a built-in function off of it. Such as below,

print(df['col1'].sum())
# Output: 10

Now, in order to apply the custom function, such as one defined above (times2), pandas provide an option to do that as well, as explained below,

print(df['col2'].apply(times2))

'''
Output:
0     888
1    1110
2    1332
3     888
Name: col2, dtype: int64
'''

Apply built-in functions,

print(df['col3'].apply(len))

'''
Output:
0    3
1    3
2    3
3    3
Name: col3, dtype: int64
'''

Apply method will be more powerful, when combined with lambda expressions. For instance,

print(df['col2'].apply(lambda x: x*2))

'''
Output:
0     888
1    1110
2    1332
3     888
Name: col2, dtype: int64
'''

Some more operations

# returns the columns names
print(df.columns) 
# Output: Index(['col1', 'col2', 'col3'], dtype='object')

#since this is a rangeindex, it actually reports 
# start, stop and step values too
print(df.index)
# Output: RangeIndex(start=0, stop=4, step=1)

# sort by column
print(df.sort_values('col2'))

'''
Output:
   col1  col2 col3
0     1   444  abc
3     4   444  xyz
1     2   555  def
2     3   666  ghi
'''

In the above result, note that the index values doesn't change, this is to ensure that the values is retained.

isnull

# isnull
print(df.isnull())

'''
Output
    col1   col2   col3
0  False  False  False
1  False  False  False
2  False  False  False
3  False  False  False
'''

The isnull() will return a dataframe of booleans indicating whether or not the value was null or not. In the above, we get a boolean of all false because we have nulls in our dataframe.

Drop NAN values

print(df.dropna())

'''
Output:
   col1  col2 col3
0     1   444  abc
1     2   555  def
2     3   666  ghi
3     4   444  xyz
'''

Fill NAN values with custom values

df = pd.DataFrame({
  'col1': [1, 2, 3, np.nan],
  'col2': [np.nan, 555, 666, 444],
  'col3': ['abc', 'def', 'ghi', 'xyz']
})

print(df)

'''
Output:
   col1   col2 col3
0   1.0    NaN  abc
1   2.0  555.0  def
2   3.0  666.0  ghi
3   NaN  444.0  xyz
'''

print(df.fillna('FILL'))

'''
Output:
   col1  col2 col3
0     1  FILL  abc
1     2   555  def
2     3   666  ghi
3  FILL   444  xyz
'''

Usage of pivot table

This methodology will be familiar for the Advanced Excel users. Consider a new dataFrame,

data = {
  'A': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'],
  'B': ['one', 'one', 'two', 'two', 'one', 'one'],
  'C': ['x', 'y', 'x', 'y', 'x', 'y'],
  'D': [1, 3, 2, 5, 4, 1]
}

df = pd.DataFrame(data)

print(df)

'''
Output:
     A    B  C  D
0  foo  one  x  1
1  foo  one  y  3
2  foo  two  x  2
3  bar  two  y  5
4  bar  one  x  4
5  bar  one  y  1
'''

The pivot table, creates a multi index dataFrame. The pivot table takes three main arguments, the values, the index and the columns.

print(df.pivot_table(values='D',index=['A', 'B'],columns=['C']))

'''
Output:
      C        x    y
A    B            
bar one  4.0  1.0
      two  NaN  5.0
foo one  1.0  3.0
      two  2.0  NaN
'''





Comments and Discussions

Ad: Are you a blogger? Join our Blogging forum.





Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing
» Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates


© https://www.includehelp.com some rights reserved.