In this post, I will summarize the most convenient way to read and write CSV files (with or w/o headers) in Python.
Write CSV files
Use the builtin CSV writer
Python has a built-in CSV module that deals with CSV files. In order to write to files in CSV format, we build a CSV writer, then write to a file using this writer. Here is a simple example:
import csv lines = [['Bob', 'male', '27'], ['Smith', 'male', '26'], ['Alice', 'female', '26']] header = ['name', 'gender', 'age'] with open("test.csv", "w", newline='') as f: writer = csv.writer(f, delimiter=',') writer.writerow(header) # write the header # write the actual content line by line for l in lines: writer.writerow(l) # or we can write in a whole # writer.writerows(lines)
In the above code snippet, the
newline parameter inside the
open method is important.
If you do not use
newline='', there will an extra blank line after each line on Windows platform.
delimiter is used to denote the delimiter between different columns.
Another way to generate csv files is to use to_csv from pandas, which is easier to use. Here is an example using pandas:
import pandas as pd lines = [['Bob', 'male', '27'], ['Smith', 'male', '26'], ['Alice', 'female', '26']] header = ['name', 'gender', 'age'] new_df = pd.DataFrame(data=lines, columns=header) new_fname = "test.csv" new_df.to_csv(new_fname, sep=",", index=False)
index=False disable adding row index to each row, which is often desired.
sep is used to change the separation character between columns.
Read CSV files
Use the CSV reader
CSV module provides a CSV reader, which we can use to read the CSV files. The CSV reader is an iterable object. We can use the following snippet to read CSV files:
import csv with open("test.csv", "r", newline='') as f: reader = csv.reader(f, delimiter=',') for l in reader: print(l) # l will be a Python list
For example, in order to read the above
test.csv file, we can use the following code:
import pandas as pd df = pd.read_csv('test.csv', delimiter=',') # df is Pandas dataframe
df in the above code will be Pandas
If the csv file you have does not have header, you should use
header=None when reading this file:
df = pd.read_csv("test.csv", delimiter=',', header=None)
To show the number of rows in the dataframe, use
To show the number of columns, use
To get a certain column, we use the column name as key:
col0 = df['name'] # col0 is Pandas Series object print(col0.tolist()) # use tolist() method to make a list
tolist() method in the above code convert Pandas Series to plain Python list.
If the csv file does not have a header, you can also use column index to access a certain column.
For example, to get column 0, you can use:
df.values(this is numpy array)
df.tolist()(plain Python list)
To access a certain row, we can use the
loc method with the row number.
row0 = df.loc # row0 is Pandas Series object df.loc.values # a numpy array print(row0.tolist()) # use tolist() method to make a list
To get the whole dataframe as a numpy ndarray, you can use
License CC BY-NC-ND 4.0