# Python for data analysis – Pandas

**Python | Data analysis using Pandas**: In this tutorial, we are going to learn about the **Data analysis using Pandas**, which is an open source library build on top of NumPy.

Submitted by Sapna Deraje Radhakrishna, on December 24, 2019

## Pandas

- Pandas is an open-source library built on top of NumPy
- It allows for fast analysis and data cleaning and preparation
- It excels in performance and productivity
- It also has built-in visualization features
- It can work with data from a wide variety of sources

### How to install Pandas?

**Using PIP**

(venv) -bash-4.2$ pip install pandas Requirement already satisfied: pandas in ./venv/lib/python3.6/site-packages (0.25.1) Requirement already satisfied: python-dateutil>=2.6.1 in ./venv/lib/python3.6/site-packages (from pandas) (2.8.0) Requirement already satisfied: pytz>=2017.2 in ./venv/lib/python3.6/site-packages (from pandas) (2019.2) Requirement already satisfied: numpy>=1.13.3 in ./venv/lib/python3.6/site-packages (from pandas) (1.17.2) Requirement already satisfied: six>=1.5 in ./venv/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas) (1.12.0) venv) -bash-4.2$

**Series**

**One-dimensional ndarray with axis labels, including time series**. It is capable of holding data of any type. The axis labels are collectively known as an index. Series is very similar to a NumPy array, built on NumPy array object. However, the difference being a series can be indexed by labels.

**Syntax:**

class pandas.Series( data=None, index=None, dtype=None, name=None, copy=False, fastpath=False )

Below snippets shows examples of creating a series,

import numpy as np import pandas as pd labels = ['a','e','i','o'] #python list data = [1,2,3,4] #python list arr = np.array(data) #NumPy array d = {'a':1,'b':2,'c':3} #python dict # creating a series object with default index print(pd.Series(data = data)) # creating a series object with labels as index print(pd.Series(data = data, index = labels)) # creating a series with NumPy array print(pd.Series(arr,index = labels)) # creating a series with dictionary, # here the key becomes the index print(pd.Series(d)) # Series can also hold built-in func print(pd.Series(data = [sum, print, len]))

**Output**

0 1 1 2 2 3 3 4 dtype: int64 a 1 e 2 i 3 o 4 dtype: int64 a 1 e 2 i 3 o 4 dtype: int64 a 1 b 2 c 3 dtype: int64 0 <built-in function sum> 1 <built-in function print> 2 <built-in function len> dtype: object

### Operations on Series

**Create two series object**

import pandas as pd ser1 = pd.Series([1,2,3,4],['Delhi','Bangalore','Mysore', 'Pune']) print(ser1) ser2 = pd.Series([1,2,5,4],['Delhi','Bangalore','Vizag','Pune']) print(ser2)

**Output**

Delhi 1 Bangalore 2 Mysore 3 Pune 4 dtype: int64 Delhi 1 Bangalore 2 Vizag 5 Pune 4 dtype: int64

To retrieve the information from the series, is similar to the python dictionary, pass on the index-label of the given data type. In the above example, the index-label is of type String.

print(ser1['Delhi']) # Output: 1

Now let's trying adding the two series,

print(ser1+ser2) ''' Output: Bangalore 4.0 Delhi 2.0 Mysore NaN Pune 8.0 Vizag NaN dtype: float64 '''

The **pandas**, adds the values of the index-labels. In case the match is not found, it will be put a NaN (null value). When the operations are performed on series or any **NumPy/Pandas** based object, the integers will be converted to float.

TOP Interview Coding Problems/Challenges

- Run-length encoding (find/print frequency of letters in a string)
- Sort an array of 0's, 1's and 2's in linear time complexity
- Checking Anagrams (check whether two string is anagrams or not)
- Relative sorting algorithm
- Finding subarray with given sum
- Find the level in a binary tree with given sum K
- Check whether a Binary Tree is BST (Binary Search Tree) or not
- 1[0]1 Pattern Count
- Capitalize first and last letter of each word in a line
- Print vertical sum of a binary tree
- Print Boundary Sum of a Binary Tree
- Reverse a single linked list
- Greedy Strategy to solve major algorithm problems
- Job sequencing problem
- Root to leaf Path Sum
- Exit Point in a Matrix
- Find length of loop in a linked list
- Toppers of Class
- Print All Nodes that don't have Sibling
- Transform to Sum Tree
- Shortest Source to Destination Path

Comments and Discussions