Home » Python

Python Pandas – Data Input and Output

In this tutorial, we are going to learn about the Data Input and Output in Python Pandas – Read a CSV file, write to CSV file, reading from HTML file, etc.
Submitted by Sapna Deraje Radhakrishna, on February 03, 2020

Pandas as a library can read and write data to a wide variety of sources. In this article, we would concentrate on the following,

  • CSV
  • Excel
  • HTML
  • SQL

In order to work with HTML files and SQL database, along with pandas, we would need to install the below library as well,

  1. pip install sqlalchemy
  2. pip install lxml
  3. pip install html5lib
  4. pip install beautifulsoup4
  5. pip install openpyxl

Note: In the below example, the reference files are located in the same folder as the python files are.

Read a CSV file

import numpy as np
import pandas as pd

# example is the csv file name
df = pd.read_csv('example')
print(df)

'''
Output:
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
'''

Pandas has built in function to read from numerous type of file format, such as csv , clipboard, html, json etc.

Write to CSV file

In the above example, we have the csv content assigned to a dataFrame variable called df.

# here index = False coz I don't want 
# to save the index as coulmn
df.to_csv('test_csv', index=False) 
pd.read_csv('test_csv')

'''
Output:
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
'''

Reading from Excel file

While the pandas can read the data from the Excel file, it is important to note that 'pandas' can not read or import the formulas or images or macros. Trying to read them can cause pandas to crash.

Also, there can be an error such as requesting us to install the xlrd library in case it is not available. [pip install xlrd].

pd.read_excel('Excel_Sample.xlsx')

'''
Output:
   Unnamed: 	0   	a   b   c   d
0           	0   	0   1   2   3
1           	1   	4   5   6   7
2           	2   	8   9  10  11
3           	3  	12  13  14  15
'''

df.to_excel('Excel_Sample_2.xlsx',sheet_name='Sheet1')

HTML input

Pandas read_html() function will read tables off of a webpage and return a list of DataFrame objects,

df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
print(df[0] )
Bank NameCitySTCERTAcquiring InstitutionClosing DateUpdated DateLoss Share TypeAgreement TerminatedTermination Date
0First CornerStone BankKing of PrussiaPA35312First-Citizens Bank & Trust CompanyMay 6, 2016July 12, 2016noneNaNNaN
1Trust Company BankMemphisTN9956The Bank of Fayette CountyApril 29, 2016August 4, 2016noneNaNNaN
2North Milwaukee State BankMilwaukeeWI20364First-Citizens Bank & Trust CompanyMarch 11, 2016June 16, 2016noneNaNNaN
3Hometown National BankLongviewWA35156Twin City BankOctober 2, 2015April 13, 2016noneNaNNaN
4The Bank of GeorgiaPeachtree CityGA35259Fidelity BankOctober 2, 2015April 13, 2016noneNaNNaN
5Premier BankDenverCO34112United Fidelity Bank, fsbJuly 10, 2015July 12, 2016noneNaNNaN
6Edgebrook BankChicagoIL57772Republic Bank of ChicagoMay 8, 2015July 12, 2016noneNaNNaN
7Doral BankEn EspanolSan JuanPR32102Banco Popular de Puerto RicoFebruary 27, 2015May 13, 2015noneNaNNaN
8Capitol City Bank & Trust CompanyAtlantaGA33938First-Citizens Bank & Trust CompanyFebruary 13, 2015April 21, 2015noneNaNNaN
9Highland Community BankChicagoIL20290United Fidelity Bank, fsbJanuary 23, 2015April 21, 2015noneNaNNaN

SQL

The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. Also, a driver library is required for the database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL. For SQLite, this is included in Python's standard library by default.

If SQLAlchemy is not installed, a fallback is only provided for SQLite (and for MySQL for backward compatibility, but this is deprecated and will be removed in a future version). This mode requires a Python database adapter that respects the Python DB-API.

See also some cookbook examples for some advanced strategies.

The key functions are,

  • read_sql_table(table_name, con[, schema, ...]) : Read SQL database table into a DataFrame.
  • read_sql_query(sql, con[, index_col, ...]) : Read SQL query into a DataFrame.
  • read_sql(sql, con[, index_col, ...]) : Read SQL query or database table into a DataFrame.
  • DataFrame.to_sql(name, con[, flavor, ...]) : Write records stored in a DataFrame to a SQL database.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
df.to_sql('data', engine)
sql_df = pd.read_sql('data',con=engine)
print(sql_df)

'''
Output:
   index         0   a   b   c   d
0      0           0   0   1   2   3
1      1           1   4   5   6   7
2      2           2   8   9  10  11
3      3           3  12  13  14  15
'''


Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.