Read csv file python - stack overflow

Here are some minimal complete examples how to read CSV files and how to write CSV files with Python.

Python 3: Reading a CSV file

Pure Python

import csv

# Define data
data = [
    (1, "A towel,", 1.0),
    (42, " it says, ", 2.0),
    (1337, "is about the most ", -1),
    (0, "massively useful thing ", 123),
    (-2, "an interstellar hitchhiker can have.", 3),
]

# Write CSV file
with open("test.csv", "wt") as fp:
    writer = csv.writer(fp, delimiter=",")
    # writer.writerow(["your", "header", "foo"])  # write header
    writer.writerows(data)

# Read CSV file
with open("test.csv") as fp:
    reader = csv.reader(fp, delimiter=",", quotechar='"')
    # next(reader, None)  # skip the headers
    data_read = [row for row in reader]

print(data_read)

After that, the contents of data_read are

[['1', 'A towel,', '1.0'],
 ['42', ' it says, ', '2.0'],
 ['1337', 'is about the most ', '-1'],
 ['0', 'massively useful thing ', '123'],
 ['-2', 'an interstellar hitchhiker can have.', '3']]

Please note that CSV reads only strings. You need to convert to the column types manually.

A Python 2+3 version was here before (link), but Python 2 support is dropped. Removing the Python 2 stuff massively simplified this answer.

Related

  • How do I write data into csv format as string (not file)?
  • How can I use io.StringIO() with the csv module?: This is interesting if you want to serve a CSV on-the-fly with Flask, without actually storing the CSV on the server.

mpu

Have a look at my utility package mpu for a super simple and easy to remember one:

import mpu.io
data = mpu.io.read('example.csv', delimiter=',', quotechar='"', skiprows=None)
mpu.io.write('example.csv', data)

Pandas

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('myfile.csv', sep=',')
print(df)

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

See read_csv docs for more information. Please note that pandas automatically infers if there is a header line, but you can set it manually, too.

If you haven't heard of Seaborn, I recommend having a look at it.

Other

Reading CSV files is supported by a bunch of other libraries, for example:

  • dask.dataframe.read_csv
  • spark.read.csv

Created CSV file

1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3

Common file endings

.csv

Working with the data

After reading the CSV file to a list of tuples / dicts or a Pandas dataframe, it is simply working with this kind of data. Nothing CSV specific.

Alternatives

  • JSON: Nice for writing human-readable data; VERY commonly used (read & write)
  • CSV: Super simple format (read & write)
  • YAML: Nice to read, similar to JSON (read & write)
  • pickle: A Python serialization format (read & write)
  • MessagePack (Python package): More compact representation (read & write)
  • HDF5 (Python package): Nice for matrices (read & write)
  • XML: exists too *sigh* (read & write)

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

I am doing coursework where it requires me to make a Quiz programme where an account for a user is created and stored to a CSV along with personal information. It then requires me to read from a CSV file and read the questions and answers for the module to run and then it stores the test data to the CSV where all the account information is stored.

I have successfully written to the CSV however I am having a problem with reading correctly. The CSV is set up with 2 columns and 5 rows in A1, A2, A3, A4 and A5 is where the questions are stored and in B1, B2, B3, B4 and B5 is where the answers are stored for the question on the row. Currently I am attempting to print the Question into an input and then use an if statement to check it against the answer I have stored in the CSV.

Currently I am having problems with printing everything within the cell and I am having to use ranges. A section of my python code is as follows...

import csv

with open("questions.csv", 'r') as f:
   for row in f:
       question1 = row[0:94]
       answer1 = row[95]




print(question1)
print(answer1)

How do I print the whole cell into the input without having to do the ranges in the row?