In this post, I will summarize the most convenient way to read and write CSV files (with or without headers) in Python.

Write CSV files

Python has a built-in CSV module which deals with CSV files. In order to write to files in CSV format, we first build a CSV writer and then write to files using this writer. I will give a simple example below:

import csv

lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]

header = ['name', 'gender', 'age']

with open("test.csv", "w", newline='') as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerow(header) # write the header
    # write the actual content line by line
    for l in lines:
        writer.writerow(l)
    # or we can write in a whole
    # writer.writerows(lines)

In the above code snippet, the newline parameter inside the open method is important. If you do not use newline='', there will an extra blank line after each line on Windows platform. The parameter delimiter is used to denote the delimiter between different items in a line inside the CSV file.

Read CSV files

Use the CSV reader

CSV module provides a CSV reader, which we can use to read the CSV files. The CSV reader is an iterable object. We can use the following snippet to read CSV files:

import csv

with open("test.csv", "r", newline='') as f:
    reader = csv.reader(f, delimiter=',')
    for l in reader:
        print(l) # l will be a Python list

Use Pandas

The famous data processing library Pandas also provides a method read_csv() to read CSV files.

For example, in order to read the above test.csv file, we can use the following code:

import pandas as pd

df = pd.read_csv('test.csv', delimiter=',') # df is Pandas dataframe

The df in the above code will be Pandas dataframe object. If the csv file you have does not have header, you should use header=None when reading this file:

df = pd.read_csv("test.csv", delimiter=',', header=None)

To show the number of rows in the dataframe, use len(df.index) or df.shape[0]. To show the number of columns, use len(df.columns) or df.shape[1].

To get a certain column, we use the column name as key:

col0 = df['name'] # col0 is Pandas Series object
print(col0.tolist()) # use tolist() method to make a list

The tolist() method in the above code convert Pandas Series to plain Python list. If the csv file does not have a header, you can also use column index to access a certain column. For example, to get column 0, you can use:

  • df[0].values (this is numpy array)
  • df[0].tolist() (plain Python list)

To access a certain row, we can use the loc method with the row number.

row0 = df.loc[0] # row0 is Pandas Series object
df.loc[0].values # a numpy array
print(row0.tolist()) # use tolist() method to make a list

To get the whole dataframe as a numpy ndarray, you can use df.values.

References