How to Read and Write CSV Files in Python
Contents
update log
2022-09-30: change order of read and write section; other small fixes.
In this post, I will share the most convenient way to read and write CSV files (with or w/o headers) in Python.
Read CSV files
Use the builtin CSV module
Python has a built-in CSV module that deals with CSV files. CSV module provides a CSV reader, which we can use to read CSV files. The CSV reader is an iterable object. We can use the following snippet to read CSV files:
import csv
with open("test.csv", "r", newline='') as f:
reader = csv.reader(f, delimiter=',')
for l in reader:
print(l) # l will be a Python list
Use Pandas
The famous data processing package Pandas also provides a method read_csv()
to read CSV files.
For example, in order to read the above test.csv
file, we can use the following code:
import pandas as pd
# delimiter is used to sinify the delimiter between different columns, change it based on your case
df = pd.read_csv('test.csv', delimiter=',') # df is Pandas dataframe
df
in the above code will be Pandas dataframe
object.
If the csv file you have does not have header, you should use header=None
when reading this file:
df = pd.read_csv("test.csv", delimiter=',', header=None)
To show the number of rows in the dataframe, use len(df.index)
or df.shape[0]
.
To show the number of columns, use len(df.columns)
or df.shape[1]
.
To get a certain column, we use the column name as key:
col0 = df['name'] # col0 is Pandas Series object
df[['c1', 'c2']] # to get multiple columns from dataframe
print(col0.tolist()) # use tolist() method to make a list
The tolist()
method in the above code convert Pandas Series to plain Python list.
If the csv file does not have a header, you can also use column index to access a certain column.
For example, to get column 0, you can use:
df[0].values
(this is numpy array)df[0].tolist()
(plain Python list)
To access a certain row, we can use the loc
method with the row number.
row0 = df.loc[0] # row0 is Pandas Series object
df.loc[0].values # a numpy array
print(row0.tolist()) # use tolist() method to make a list
To get the whole dataframe as a numpy ndarray, you can use df.values
.
Write CSV files
Use the builtin CSV writer
In order to write to files in CSV format, we build a CSV writer, then write to a file using this writer. Here is a simple example:
import csv
lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]
header = ['name', 'gender', 'age']
with open("test.csv", "w", newline='') as f:
writer = csv.writer(f, delimiter=',')
writer.writerow(header) # write the header
# write the actual content line by line
for l in lines:
writer.writerow(l)
# or we can write in a whole
# writer.writerows(lines)
In the above code snippet, the newline
parameter inside the open
method is important.
If you do not use newline=''
, there will an extra blank line after each line on Windows platform.
The parameter delimiter
is used to denote the delimiter between different columns.
Use pandas
Another way to generate csv files is to use to_csv() from pandas, which is easier to use. Here is an example using pandas:
import pandas as pd
lines = [['Bob', 'male', '27'],
['Smith', 'male', '26'],
['Alice', 'female', '26']]
header = ['name', 'gender', 'age']
new_df = pd.DataFrame(data=lines, columns=header)
new_fname = "test.csv"
new_df.to_csv(new_fname, sep=",", index=False)
The parameter index=False
disable adding row index to each row, which is often desired.
sep
is used to change the separation character between columns.