In this post, I will summarize the most convenient way to read and write CSV files (with header) in Python.

Write CSV files

Python has a built-in CSV module which deals with CSV files. In order to write to files in CSV format, we first build a CSV writer and then write to files using this writer. I will give a simple example below:

import csv

lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]

header = ['name', 'gender', 'age']

with open("test.csv", "w", newline='') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerow(header) # write the header
    # write the actual content line by line
    for l in lines:
        writer.writerow(l)
    # or we can write in a whole
    # writer.writerows(lines)

In the above code snippet, the newline parameter inside the open method is important. If you do not use newline='', there will an extra blank line after each line on Windows platform. The parameter delimiter is used to denote the delimiter between different items in a line inside the CSV file.

Read CSV files

Use the CSV reader

CSV module provides a CSV reader, which we can use to read the CSV files. The CSV reader is an iterable object. We can use the following snippet to read CSV files:

import csv

with open("test.csv", "r", newline='') as f:
    reader = csv.reader(f, delimiter=',')
    for l in reader:
        print(l) # l will be a Python list

Use Pandas

The famous data processing library Pandas also provides a method read_csv to read CSV files.

For example, in order to read the above test.csv file, we can use the following code:

import pandas as pd

df = pd.read_csv('test.csv', delimiter=',') # df is Pandas dataframe

The df in the above code will be Pandas dataframe object. To get a certain column, we use the column name as key:

col0 = df['name'] # col0 is Pandas Series object
print(col0.tolist()) # use tolist() method to make a list

The tolist() method in the above code convert Pandas Series to plain Python list.

To access a certain row, we can use the loc method with the row number.

row0 = pd.loc[0] # row0 is Pandas Series object
print(row0.tolist()) # use tolist() method to make a list

References