Skip to main content
  1. Posts/

Excel Processing using Pandas

··343 words·2 mins·
Python Pandas
Table of Contents

A brief summary of how to read and write excel files using Pandas package.

First we need to install pandas and pre-requisite:

pip install pandas
pip install openpyxl  # in order to read xlsx files
pip install xlrd

Read xlsx files
#

Note that in order to read xlsx files, we need to install openpyxl and use it like this:

import pandas as pd

df = pd.read_excel('test.xlsx', sheet_name=0, engine='openpyxl')

For param sheet_name, either the literal sheet name or the sheet index is okay. sheet_name=0 means to read the 1st sheet.

Convert text in each cell to string type
#

When the sheet contains date and numbers, read_excel() will convert it to Timestamps and numbers by default, if we would like to convert all text to strings, we can use the dtype parameter:

df = pd.read_excel('test.xlsx', sheet_name=0, dtype=str, engine='openpyxl')

Read empty cell as empty string
#

It seems that pandas will by default read empty cell as NAN values. To read empty cell as empty string, use keep_default_na=False when reading excel:

df = pd.read_excel('test.xlsx', sheet_name=0, dtype=str, engine='openpyxl', keep_default_na=False)

Or after reading the sheet, use fillna() to replace the nan values with empty string:

df = pd.read_excel('test.xlsx', sheet_name=0, dtype=str, engine='openpyxl').fillna('')

Convert panda data frame to numpy array
#

Use to_numpy():

df = pd.read_excel('test.xlsx', sheet_name=0, dtype=str, engine='openpyxl').fillna('')
data = df.to_numpy()  # data is now numpy array

Convert list to excel files
#

To convert a list of list (each sub-list has the same number of elements) to excel files, use to_excel():

import pandas as pd

my_list = [[1, 2, 3], [4, 5, 6]]
df = pd.DataFrame(my_list)

df.to_excel('my_list.xlsx')

To remove row index, use index=False:

df.to_excel('my_list.xlsx', index=False)

To add header for each column, we can provide a list of strings for each column using header parameter.

import  pandas  as  pd
my_list  =  [[1,  2,  3],  [4,  5,  6]]
df = pd.DataFrame(my_list)

header = ['c1', 'c2', 'c3']
df.to_excel('my_list.xlsx', index=False, header=header)

References
#

Related

How to Read and Write CSV Files in Python
··607 words·3 mins
Python Pandas
Speed up document indexing in Elasticsearch via bulk indexing
·355 words·2 mins
Python Elasticsearch
Configure Python logging with dictConfig
··503 words·3 mins
Python Logging