Skip to main content
  1. Posts/

How to Read and Write CSV Files in Python

··607 words·3 mins·
Python Pandas
Table of Contents
update log

2022-09-30: change order of read and write section; other small fixes.

In this post, I will share the most convenient way to read and write CSV files (with or w/o headers) in Python.

Read CSV files
#

Use the builtin CSV module
#

Python has a built-in CSV module that deals with CSV files. CSV module provides a CSV reader, which we can use to read CSV files. The CSV reader is an iterable object. We can use the following snippet to read CSV files:

import csv

with open("test.csv", "r", newline='') as f:
    reader = csv.reader(f, delimiter=',')
    for l in reader:
        print(l) # l will be a Python list

Use Pandas
#

The famous data processing package Pandas also provides a method read_csv() to read CSV files.

For example, in order to read the above test.csv file, we can use the following code:

import pandas as pd

# delimiter is used to sinify the delimiter between different columns, change it based on your case
df = pd.read_csv('test.csv', delimiter=',') # df is Pandas dataframe

df in the above code will be Pandas dataframe object. If the csv file you have does not have header, you should use header=None when reading this file:

df = pd.read_csv("test.csv", delimiter=',', header=None)

To show the number of rows in the dataframe, use len(df.index) or df.shape[0]. To show the number of columns, use len(df.columns) or df.shape[1].

To get a certain column, we use the column name as key:

col0 = df['name']  # col0 is Pandas Series object
df[['c1', 'c2']]  # to get multiple columns from dataframe
print(col0.tolist()) # use tolist() method to make a list

The tolist() method in the above code convert Pandas Series to plain Python list. If the csv file does not have a header, you can also use column index to access a certain column. For example, to get column 0, you can use:

  • df[0].values (this is numpy array)
  • df[0].tolist() (plain Python list)

To access a certain row, we can use the loc method with the row number.

row0 = df.loc[0] # row0 is Pandas Series object
df.loc[0].values # a numpy array
print(row0.tolist()) # use tolist() method to make a list

To get the whole dataframe as a numpy ndarray, you can use df.values.

Write CSV files
#

Use the builtin CSV writer
#

In order to write to files in CSV format, we build a CSV writer, then write to a file using this writer. Here is a simple example:

import csv

lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]

header = ['name', 'gender', 'age']

with open("test.csv", "w", newline='') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerow(header) # write the header
    # write the actual content line by line
    for l in lines:
        writer.writerow(l)
    # or we can write in a whole
    # writer.writerows(lines)

In the above code snippet, the newline parameter inside the open method is important. If you do not use newline='', there will an extra blank line after each line on Windows platform. The parameter delimiter is used to denote the delimiter between different columns.

Use pandas
#

Another way to generate csv files is to use to_csv() from pandas, which is easier to use. Here is an example using pandas:

import pandas as pd

lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]

header = ['name', 'gender', 'age']
new_df = pd.DataFrame(data=lines, columns=header)

new_fname = "test.csv"
new_df.to_csv(new_fname, sep=",", index=False)

The parameter index=False disable adding row index to each row, which is often desired. sep is used to change the separation character between columns.

References
#

Related

Excel Processing using Pandas
··343 words·2 mins
Python Pandas
Speed up document indexing in Elasticsearch via bulk indexing
·355 words·2 mins
Python Elasticsearch
Configure Python logging with dictConfig
··503 words·3 mins
Python Logging