How to Read and Write CSV Files in Python
2022-09-30: change order of read and write section; other small fixes.
In this post, I will share the most convenient way to read and write CSV files (with or w/o headers) in Python.
Read CSV files
Use the builtin CSV module
Python has a built-in CSV module that deals with CSV files. CSV module provides a CSV reader, which we can use to read CSV files. The CSV reader is an iterable object. We can use the following snippet to read CSV files:
import csv with open("test.csv", "r", newline='') as f: reader = csv.reader(f, delimiter=',') for l in reader: print(l) # l will be a Python list
The famous data processing package Pandas also provides a method
read_csv() to read CSV files.
For example, in order to read the above
test.csv file, we can use the following code:
import pandas as pd # delimiter is used to sinify the delimiter between different columns, change it based on your case df = pd.read_csv('test.csv', delimiter=',') # df is Pandas dataframe
df in the above code will be Pandas
If the csv file you have does not have header, you should use
header=None when reading this file:
df = pd.read_csv("test.csv", delimiter=',', header=None)
To show the number of rows in the dataframe, use
To show the number of columns, use
To get a certain column, we use the column name as key:
col0 = df['name'] # col0 is Pandas Series object df[['c1', 'c2']] # to get multiple columns from dataframe print(col0.tolist()) # use tolist() method to make a list
tolist() method in the above code convert Pandas Series to plain Python list.
If the csv file does not have a header, you can also use column index to access a certain column.
For example, to get column 0, you can use:
df.values(this is numpy array)
df.tolist()(plain Python list)
To access a certain row, we can use the
loc method with the row number.
row0 = df.loc # row0 is Pandas Series object df.loc.values # a numpy array print(row0.tolist()) # use tolist() method to make a list
To get the whole dataframe as a numpy ndarray, you can use
Write CSV files
Use the builtin CSV writer
In order to write to files in CSV format, we build a CSV writer, then write to a file using this writer. Here is a simple example:
import csv lines = [['Bob', 'male', '27'], ['Smith', 'male', '26'], ['Alice', 'female', '26']] header = ['name', 'gender', 'age'] with open("test.csv", "w", newline='') as f: writer = csv.writer(f, delimiter=',') writer.writerow(header) # write the header # write the actual content line by line for l in lines: writer.writerow(l) # or we can write in a whole # writer.writerows(lines)
In the above code snippet, the
newline parameter inside the
open method is important.
If you do not use
newline='', there will an extra blank line after each line on Windows platform.
delimiter is used to denote the delimiter between different columns.
Another way to generate csv files is to use to_csv() from pandas, which is easier to use. Here is an example using pandas:
import pandas as pd lines = [['Bob', 'male', '27'], ['Smith', 'male', '26'], ['Alice', 'female', '26']] header = ['name', 'gender', 'age'] new_df = pd.DataFrame(data=lines, columns=header) new_fname = "test.csv" new_df.to_csv(new_fname, sep=",", index=False)
index=False disable adding row index to each row, which is often desired.
sep is used to change the separation character between columns.
License CC BY-NC-ND 4.0