Home » Python

Python for data analysis – Pandas

Python | Data analysis using Pandas: In this tutorial, we are going to learn about the Data analysis using Pandas, which is an open source library build on top of NumPy.
Submitted by Sapna Deraje Radhakrishna, on December 24, 2019


  • Pandas is an open-source library built on top of NumPy
  • It allows for fast analysis and data cleaning and preparation
  • It excels in performance and productivity
  • It also has built-in visualization features
  • It can work with data from a wide variety of sources

How to install Pandas?

Using PIP

(venv) -bash-4.2$ pip install pandas
Requirement already satisfied: pandas in ./venv/lib/python3.6/site-packages (0.25.1)
Requirement already satisfied: python-dateutil>=2.6.1 in ./venv/lib/python3.6/site-packages (from pandas) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in ./venv/lib/python3.6/site-packages (from pandas) (2019.2)
Requirement already satisfied: numpy>=1.13.3 in ./venv/lib/python3.6/site-packages (from pandas) (1.17.2)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0)
venv) -bash-4.2$


One-dimensional ndarray with axis labels, including time series. It is capable of holding data of any type. The axis labels are collectively known as an index. Series is very similar to a NumPy array, built on NumPy array object. However, the difference being a series can be indexed by labels.


class pandas.Series(
    index=None, dtype=None, 

Below snippets shows examples of creating a series,

import numpy as np
import pandas as pd

labels = ['a','e','i','o'] #python list
data = [1,2,3,4] #python list
arr = np.array(data) #NumPy array
d = {'a':1,'b':2,'c':3} #python dict

# creating a series object with default index
print(pd.Series(data = data))

# creating a series object with labels as index
print(pd.Series(data = data, index = labels))

# creating a series with NumPy array
print(pd.Series(arr,index = labels))

# creating a series with dictionary, 
# here the key becomes the index

# Series can also hold built-in func
print(pd.Series(data = [sum, print, len]))


0    1
1    2
2    3
3    4
dtype: int64
a    1
e    2
i    3
o    4
dtype: int64
a    1
e    2
i    3
o    4
dtype: int64
a    1
b    2
c    3
dtype: int64
0       <built-in function sum>
1       <built-in function print>
2       <built-in function len>
dtype: object

Operations on Series

Create two series object

import pandas as pd

ser1 = pd.Series([1,2,3,4],['Delhi','Bangalore','Mysore', 'Pune'])

ser2 = pd.Series([1,2,5,4],['Delhi','Bangalore','Vizag','Pune'])


Delhi        1
Bangalore    2
Mysore       3
Pune         4
dtype: int64
Delhi        1
Bangalore    2
Vizag        5
Pune         4
dtype: int64

To retrieve the information from the series, is similar to the python dictionary, pass on the index-label of the given data type. In the above example, the index-label is of type String.

# Output: 1

Now let's trying adding the two series,

Bangalore    4.0
Delhi        2.0
Mysore       NaN
Pune         8.0
Vizag        NaN
dtype: float64

The pandas, adds the values of the index-labels. In case the match is not found, it will be put a NaN (null value). When the operations are performed on series or any NumPy/Pandas based object, the integers will be converted to float.


Comments and Discussions




Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing
» Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates

© some rights reserved.