Python for data analysis – Pandas

Python | Data analysis using Pandas: In this tutorial, we are going to learn about the Data analysis using Pandas, which is an open source library build on top of NumPy. By Sapna Deraje Radhakrishna Last updated : December 21, 2023

Pandas Overview

  • Pandas is an open-source library built on top of NumPy
  • It allows for fast analysis and data cleaning and preparation
  • It excels in performance and productivity
  • It also has built-in visualization features
  • It can work with data from a wide variety of sources

How to install Pandas?

Using PIP, you can install pandas library by using the pip install pandas command.

Below is the example of running this command:

(venv) -bash-4.2$ pip install pandas

Requirement already satisfied: pandas in ./venv/lib/python3.6/site-packages (0.25.1)
Requirement already satisfied: python-dateutil>=2.6.1 in ./venv/lib/python3.6/site-packages (from pandas) (2.8.0)
Requirement already satisfied: pytz>=2017.2 in ./venv/lib/python3.6/site-packages (from pandas) (2019.2)
Requirement already satisfied: numpy>=1.13.3 in ./venv/lib/python3.6/site-packages (from pandas) (1.17.2)
Requirement already satisfied: six>=1.5 in ./venv/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0)
venv) -bash-4.2$

Pandas Series

One-dimensional ndarray with axis labels, including time series. It is capable of holding data of any type. The axis labels are collectively known as an index. Series is very similar to a NumPy array, built on NumPy array object. However, the difference being a series can be indexed by labels.

Syntax

Below is the syntax to create a pandas.series() method:

class pandas.Series(
    data=None, 
    index=None, dtype=None, 
    name=None, 
    copy=False, 
    fastpath=False
    )

Creating Pandas Series

A pandas series is created by using the pandas.series() method.

Example

Below snippets shows examples of creating a series,

import numpy as np
import pandas as pd

labels = ['a','e','i','o'] #python list
data = [1,2,3,4] #python list
arr = np.array(data) #NumPy array
d = {'a':1,'b':2,'c':3} #python dict

# creating a series object with default index
print(pd.Series(data = data))

# creating a series object with labels as index
print(pd.Series(data = data, index = labels))

# creating a series with NumPy array
print(pd.Series(arr,index = labels))

# creating a series with dictionary, 
# here the key becomes the index
print(pd.Series(d))

# Series can also hold built-in func
print(pd.Series(data = [sum, print, len]))

Output

0    1
1    2
2    3
3    4
dtype: int64
a    1
e    2
i    3
o    4
dtype: int64
a    1
e    2
i    3
o    4
dtype: int64
a    1
b    2
c    3
dtype: int64
0       <built-in function sum>
1       <built-in function print>
2       <built-in function len>
dtype: object

Operations on Pandas Series

1. Create two series object

import pandas as pd

ser1 = pd.Series([1,2,3,4],['Delhi','Bangalore','Mysore', 'Pune'])
print(ser1)

ser2 = pd.Series([1,2,5,4],['Delhi','Bangalore','Vizag','Pune'])
print(ser2)

Output

Delhi        1
Bangalore    2
Mysore       3
Pune         4
dtype: int64
Delhi        1
Bangalore    2
Vizag        5
Pune         4
dtype: int64

2. Retrieve the information from the series

To retrieve the information from the series, is similar to the Python dictionary, pass on the index-label of the given data type. In the above example, the index-label is of type String.

print(ser1['Delhi'])
# Output: 1

3. Adding Two Pandas Series

Now let's trying adding the two series,

print(ser1+ser2)
'''
Output:
Bangalore    4.0
Delhi        2.0
Mysore       NaN
Pune         8.0
Vizag        NaN
dtype: float64
'''

The pandas, adds the values of the index-labels. In case the match is not found, it will be put a NaN (null value). When the operations are performed on series or any NumPy/Pandas based object, the integers will be converted to float.



Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.