How do you iterate over a csv file in python?


Given CSV file file.csv:

column1,column2
foo,bar
baz,qux

You can loop through the rows in Python using library csv or pandas.

Using csv.reader:

import csv

filename = 'file.csv'

with open(filename, 'r') as csvfile:
    datareader = csv.reader(csvfile)
    for row in datareader:
        print(row)

Output:

['column1', 'column2']
['foo', 'bar']
['baz', 'qux']

Repl.it demo:

pandas

Install pandas:

Using pandas.read_csv and pandas.DataFrame.iterrows:

import pandas as pd

filename = 'file.csv'
df = pd.read_csv(filename)

for index, row in df.iterrows():
    print(row)

Output:

column1    foo
column2    bar
Name: 0, dtype: object
column1    baz
column2    qux
Name: 1, dtype: object

Repl.it demo:



Please support this site and join our Discord!



so I've seen this done is other questions asked here but I'm still a little confused. I've been learning python3 for the last few days and figured I'd start working on a project to really get my hands dirty. I need to loop through a certain amount of CSV files and make edits to those files. I'm having trouble with going to a specific column and also for loops in python in general. I'm used to the convention (int i = 0; i < expression; i++), but in python it's a little different. Here's my code so far and I'll explain where my issue is.

import os
import csv

pathName = os.getcwd()

numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
    if fileNames.endswith(".csv"):
        numFiles.append(fileNames)

for i in numFiles:
    file = open(os.path.join(pathName, i), "rU")
    reader = csv.reader(file, delimiter=',')
    for column in reader:
        print(column[4])

My issue falls on this line:

for column in reader:
        print(column[4])

So in the Docs it says column is the variable and reader is what I'm looping through. But when I write 4 I get this error:

IndexError: list index out of range

What does this mean? If I write 0 instead of 4 it prints out all of the values in column 0 cell 0 of each CSV file. I basically need it to go through the first row of each CSV file and find a specific value and then go through that entire column. Thanks in advance!

In this article we will discuss how to read a CSV file line by line with or without header. Also select specific columns while iterating over a CSV file line by line.

Suppose we have a csv file students.csv and its contents are,

Id,Name,Course,City,Session
21,Mark,Python,London,Morning
22,John,Python,Tokyo,Evening
23,Sam,Python,Paris,Morning
32,Shaun,Java,Tokyo,Morning

We want to read all the rows of this csv file line by line and process each line at a time.

Also note that, here we don’t want to read all lines into a list of lists and then iterate over it, because that will not be an efficient solution for large csv file i.e. file with size in GBs. We are looking for solutions where we read & process only one line at a time while iterating through all rows of csv, so that minimum memory is utilized.

Let’s see how to do this,

Advertisements

Python has a csv module, which provides two different classes to read the contents of a csv file i.e. csv.reader and csv.DictReader. Let’s discuss & use them one by one to read a csv file line by line,

Read a CSV file line by line using csv.reader

With csv module’s reader class object we can iterate over the lines of a csv file as a list of values, where each value in the list is a cell value. Let’s understand with an example,

from csv import reader

# open file in read mode
with open('students.csv', 'r') as read_obj:
    # pass the file object to reader() to get the reader object
    csv_reader = reader(read_obj)
    # Iterate over each row in the csv using reader object
    for row in csv_reader:
        # row variable is a list that represents a row in csv
        print(row)

Output:

['Id', 'Name', 'Course', 'City', 'Session']
['21', 'Mark', 'Python', 'London', 'Morning']
['22', 'John', 'Python', 'Tokyo', 'Evening']
['23', 'Sam', 'Python', 'Paris', 'Morning']
['32', 'Shaun', 'Java', 'Tokyo', 'Morning']

It iterates over all the rows of students.csv file. For each row it fetched the contents of that row as a list and printed that list.

How did it work ?

It performed the following steps,

  1. Open the file ‘students.csv’ in read mode and create a file object.
  2. Create a reader object (iterator) by passing file object in csv.reader() function.
  3. Now once we have this reader object, which is an iterator, then use this iterator with for loop to read individual rows of the csv as list of values. Where each value in the list represents an individual cell.

This way only one line will be in memory at a time while iterating through csv file, which makes it a memory efficient solution.

In the previous example we iterated through all the rows of csv file including header. But suppose we want to skip the header and iterate over the remaining rows of csv file.
Let’s see how to do that,

from csv import reader

# skip first line i.e. read header first and then iterate over each row od csv as a list
with open('students.csv', 'r') as read_obj:
    csv_reader = reader(read_obj)
    header = next(csv_reader)
    # Check file as empty
    if header != None:
        # Iterate over each row after the header in the csv
        for row in csv_reader:
            # row variable is a list that represents a row in csv
            print(row)

Output:

['21', 'Mark', 'Python', 'London', 'Morning']
['22', 'John', 'Python', 'Tokyo', 'Evening']
['23', 'Sam', 'Python', 'Paris', 'Morning']
['32', 'Shaun', 'Java', 'Tokyo', 'Morning']
Header was: 
['Id', 'Name', 'Course', 'City', 'Session']

It skipped the header row of csv file and iterate over all the remaining rows of students.csv file. For each row it fetched the contents of that row as a list and printed that list. In initially saved the header row in a separate variable and printed that in end.

How did it work ?

As reader() function returns an iterator object, which we can use with Python for loop to iterate over the rows. But in the above example we called the next() function on this iterator object initially, which returned the first row of csv. After that we used the iterator object with for loop to iterate over remaining rows of the csv file.

Read csv file line by line using csv module DictReader object

With csv module’s DictReader class object we can iterate over the lines of a csv file as a dictionary i.e.
for each row a dictionary is returned, which contains the pair of column names and cell values for that row.
Let’s understand with an example,

from csv import DictReader

# open file in read mode
with open('students.csv', 'r') as read_obj:
    # pass the file object to DictReader() to get the DictReader object
    csv_dict_reader = DictReader(read_obj)
    # iterate over each line as a ordered dictionary
    for row in csv_dict_reader:
        # row variable is a dictionary that represents a row in csv
        print(row)

Output:

{'Id': '21', 'Name': 'Mark', 'Course': 'Python', 'City': 'London', 'Session': 'Morning'}
{'Id': '22', 'Name': 'John', 'Course': 'Python', 'City': 'Tokyo', 'Session': 'Evening'}
{'Id': '23', 'Name': 'Sam', 'Course': 'Python', 'City': 'Paris', 'Session': 'Morning'}
{'Id': '32', 'Name': 'Shaun', 'Course': 'Java', 'City': 'Tokyo', 'Session': 'Morning'}

It iterates over all the rows of students.csv file. For each row it fetches the contents of that row as a dictionary and printed that list.

How did it work ?

It performed the following steps,

  1. Open the file ‘students.csv’ in read mode and create a file object.
  2. Create a DictReader object (iterator) by passing file object in csv.DictReader().
  3. Now once we have this DictReader object, which is an iterator. Use this iterator object with for loop to read individual rows of the csv as a dictionary. Where each pair in this dictionary represents contains the column name & column value for that row.

It is a memory efficient solution, because at a time only one line is in memory.

Get column names from header in csv file

DictReader class has a member function that returns the column names of the csv file as list.
let’s see how to use it,

from csv import DictReader

# open file in read mode
with open('students.csv', 'r') as read_obj:
    # pass the file object to DictReader() to get the DictReader object
    csv_dict_reader = DictReader(read_obj)
    # get column names from a csv file
    column_names = csv_dict_reader.fieldnames
    print(column_names)

Output:

['Id', 'Name', 'Course', 'City', 'Session']

Read specific columns from a csv file while iterating line by line

Read specific columns (by column name) in a csv file while iterating row by row

Iterate over all the rows of students.csv file line by line, but print only two columns of for each row,

from csv import DictReader

# iterate over each line as a ordered dictionary and print only few column by column name
with open('students.csv', 'r') as read_obj:
    csv_dict_reader = DictReader(read_obj)
    for row in csv_dict_reader:
        print(row['Id'], row['Name'])

Output:

21 Mark
22 John
23 Sam
32 Shaun

DictReader returns a dictionary for each line during iteration. As in this dictionary keys are column names and values are cell values for that column. So, for selecting specific columns in every row, we used column name with the dictionary object.

Read specific columns (by column Number) in a csv file while iterating row by row

Iterate over all rows students.csv and for each row print contents of 2ns and 3rd column,

from csv import reader

# iterate over each line as a ordered dictionary and print only few column by column Number
with open('students.csv', 'r') as read_obj:
    csv_reader = reader(read_obj)
    for row in csv_reader:
        print(row[1], row[2])

Output:

Name Course
Mark Python
John Python
Sam Python
Shaun Java

With csv.reader each row of csv file is fetched as a list of values, where each value represents a column value. So, selecting 2nd & 3rd column for each row, select elements at index 1 and 2 from the list.

The complete example is as follows,

from csv import reader
from csv import DictReader


def main():
    print('*** Read csv file line by line using csv module reader object ***')

    print('*** Iterate over each row of a csv file as list using reader object ***')

    # open file in read mode
    with open('students.csv', 'r') as read_obj:
        # pass the file object to reader() to get the reader object
        csv_reader = reader(read_obj)
        # Iterate over each row in the csv using reader object
        for row in csv_reader:
            # row variable is a list that represents a row in csv
            print(row)

    print('*** Read csv line by line without header ***')

    # skip first line i.e. read header first and then iterate over each row od csv as a list
    with open('students.csv', 'r') as read_obj:
        csv_reader = reader(read_obj)
        header = next(csv_reader)
        # Check file as empty
        if header != None:
            # Iterate over each row after the header in the csv
            for row in csv_reader:
                # row variable is a list that represents a row in csv
                print(row)

    print('Header was: ')
    print(header)

    print('*** Read csv file line by line using csv module DictReader object ***')

    # open file in read mode
    with open('students.csv', 'r') as read_obj:
        # pass the file object to DictReader() to get the DictReader object
        csv_dict_reader = DictReader(read_obj)
        # iterate over each line as a ordered dictionary
        for row in csv_dict_reader:
            # row variable is a dictionary that represents a row in csv
            print(row)

    print('*** select elements by column name while reading csv file line by line ***')

    # open file in read mode
    with open('students.csv', 'r') as read_obj:
        # pass the file object to DictReader() to get the DictReader object
        csv_dict_reader = DictReader(read_obj)
        # iterate over each line as a ordered dictionary
        for row in csv_dict_reader:
            # row variable is a dictionary that represents a row in csv
            print(row['Name'], ' is from ' , row['City'] , ' and he is studying ', row['Course'])

    print('*** Get column names from header in csv file ***')

    # open file in read mode
    with open('students.csv', 'r') as read_obj:
        # pass the file object to DictReader() to get the DictReader object
        csv_dict_reader = DictReader(read_obj)
        # get column names from a csv file
        column_names = csv_dict_reader.fieldnames
        print(column_names)

    print('*** Read specific columns from a csv file while iterating line by line ***')

    print('*** Read specific columns (by column name) in a csv file while iterating row by row ***')

    # iterate over each line as a ordered dictionary and print only few column by column name
    with open('students.csv', 'r') as read_obj:
        csv_dict_reader = DictReader(read_obj)
        for row in csv_dict_reader:
            print(row['Id'], row['Name'])

    print('*** Read specific columns (by column Number) in a csv file while iterating row by row ***')

    # iterate over each line as a ordered dictionary and print only few column by column Number
    with open('students.csv', 'r') as read_obj:
        csv_reader = reader(read_obj)
        for row in csv_reader:
            print(row[1], row[2])


if __name__ == '__main__':
    main()

Output:

*** Read csv file line by line using csv module reader object ***
*** Iterate over each row of a csv file as list using reader object ***
['Id', 'Name', 'Course', 'City', 'Session']
['21', 'Mark', 'Python', 'London', 'Morning']
['22', 'John', 'Python', 'Tokyo', 'Evening']
['23', 'Sam', 'Python', 'Paris', 'Morning']
['32', 'Shaun', 'Java', 'Tokyo', 'Morning']
*** Read csv line by line without header ***
['21', 'Mark', 'Python', 'London', 'Morning']
['22', 'John', 'Python', 'Tokyo', 'Evening']
['23', 'Sam', 'Python', 'Paris', 'Morning']
['32', 'Shaun', 'Java', 'Tokyo', 'Morning']
Header was: 
['Id', 'Name', 'Course', 'City', 'Session']
*** Read csv file line by line using csv module DictReader object ***
{'Id': '21', 'Name': 'Mark', 'Course': 'Python', 'City': 'London', 'Session': 'Morning'}
{'Id': '22', 'Name': 'John', 'Course': 'Python', 'City': 'Tokyo', 'Session': 'Evening'}
{'Id': '23', 'Name': 'Sam', 'Course': 'Python', 'City': 'Paris', 'Session': 'Morning'}
{'Id': '32', 'Name': 'Shaun', 'Course': 'Java', 'City': 'Tokyo', 'Session': 'Morning'}
*** select elements by column name while reading csv file line by line ***
Mark  is from  London  and he is studying  Python
John  is from  Tokyo  and he is studying  Python
Sam  is from  Paris  and he is studying  Python
Shaun  is from  Tokyo  and he is studying  Java
*** Get column names from header in csv file ***
['Id', 'Name', 'Course', 'City', 'Session']
*** Read specific columns from a csv file while iterating line by line ***
*** Read specific columns (by column name) in a csv file while iterating row by row ***
21 Mark
22 John
23 Sam
32 Shaun
*** Read specific columns (by column Number) in a csv file while iterating row by row ***
Name Course
Mark Python
John Python
Sam Python
Shaun Java

How do I open and iterate a CSV file in Python?

“iterate through csv python” Code Answer.
import csv..
filename = 'file.csv'.
with open(filename, 'r') as csvfile:.
datareader = csv. reader(csvfile).
for row in datareader:.
print(row).

How do I iterate over a CSV file in pandas?

How To Iterate Over Rows In A Dataframe In Pandas.
import pandas as pd import time..
df = pd. read_csv('College.csv').
df. head(1) Out[4]: ... .
In [5]: len(df) Out[5]: ... .
In [6]: st = time. time() for index, row in df. ... .
print(end-st) 0.10507607460021973..
In [8]: st = time. time() for row in df. ... .
print(end-st) 0.010402679443359375..

How do I traverse a column in a CSV file in Python?

Python is base0 which means it starts counting at 0 so the first column would be column[0], the second would be column[1]. for row in reader: because reader iterates through the rows, not the columns. This code loops through each row and then each column in that row allowing you to view the contents of each cell.

How do I read a CSV file in a row wise in Python?

csv file in reading mode using open() function. Then, the csv. reader() is used to read the file, which returns an iterable reader object. The reader object is then iterated using a for loop to print the contents of each row.