Table of Contents

update log

2022-09-30: change order of read and write section; other small fixes.

In this post, I will share the most convenient way to read and write CSV files (with or w/o headers) in Python.

Read CSV files
#

Use the builtin CSV module
#

Python has a built-in CSV module that deals with CSV files. CSV module provides a CSV reader, which we can use to read CSV files. The CSV reader is an iterable object. We can use the following snippet to read CSV files:

import csv

with open("test.csv", "r", newline='') as f:
    reader = csv.reader(f, delimiter=',')
    for l in reader:
        print(l) # l will be a Python list

Use Pandas
#

The famous data processing package Pandas also provides a method read_csv() to read CSV files.

For example, in order to read the above test.csv file, we can use the following code:

import pandas as pd

# delimiter is used to sinify the delimiter between different columns, change it based on your case
df = pd.read_csv('test.csv', delimiter=',') # df is Pandas dataframe

df in the above code will be Pandas dataframe object. If the csv file you have does not have header, you should use header=None when reading this file:

df = pd.read_csv("test.csv", delimiter=',', header=None)

To show the number of rows in the dataframe, use len(df.index) or df.shape[0]. To show the number of columns, use len(df.columns) or df.shape[1].

To get a certain column, we use the column name as key:

col0 = df['name']  # col0 is Pandas Series object
df[['c1', 'c2']]  # to get multiple columns from dataframe
print(col0.tolist()) # use tolist() method to make a list

The tolist() method in the above code convert Pandas Series to plain Python list. If the csv file does not have a header, you can also use column index to access a certain column. For example, to get column 0, you can use:

df[0].values (this is numpy array)
df[0].tolist() (plain Python list)

To access a certain row, we can use the loc method with the row number.

row0 = df.loc[0] # row0 is Pandas Series object
df.loc[0].values # a numpy array
print(row0.tolist()) # use tolist() method to make a list

To get the whole dataframe as a numpy ndarray, you can use df.values.

Write CSV files
#

Use the builtin CSV writer
#

In order to write to files in CSV format, we build a CSV writer, then write to a file using this writer. Here is a simple example:

import csv

lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]

header = ['name', 'gender', 'age']

with open("test.csv", "w", newline='') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerow(header) # write the header
    # write the actual content line by line
    for l in lines:
        writer.writerow(l)
    # or we can write in a whole
    # writer.writerows(lines)

In the above code snippet, the newline parameter inside the open method is important. If you do not use newline='', there will an extra blank line after each line on Windows platform. The parameter delimiter is used to denote the delimiter between different columns.

Use pandas
#

Another way to generate csv files is to use to_csv() from pandas, which is easier to use. Here is an example using pandas:

import pandas as pd

lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]

header = ['name', 'gender', 'age']
new_df = pd.DataFrame(data=lines, columns=header)

new_fname = "test.csv"
new_df.to_csv(new_fname, sep=",", index=False)

The parameter index=False disable adding row index to each row, which is often desired. sep is used to change the separation character between columns.