What is the best way to read a csv file in python?
Intro: In this article, I will walk you through the different ways of reading and writing CSV files in Python. Show
Table of Contents:
1. What is a CSV?CSV stands for “Comma Separated Values.” It is the simplest form of storing data in tabular form as plain text. It is important to know to work with CSV because we mostly rely on CSV data in our day-to-day lives as data scientists. Structure of CSV:We have a file named “Salary_Data.csv.” The first line of a CSV file is the header and contains the names of the fields/features. After the header, each line of the file is an observation/a record. The values of a record are separated by “comma.” 2. Reading a CSVCSV files can be handled in multiple ways in Python. 2.1 Using csv.readerReading a CSV using Python’s inbuilt module called csv using csv.reader object. Steps to read a CSV file: 1. Import the csv libraryimport csv 2. Open the CSV fileThe .open()method in python is used to open files and return a file object. file = open('Salary_Data.csv') type(file) The type of file is “_io.TextIOWrapper” which is a file object that is returned by the open() method. 3. Use the csv.reader object to read the CSV filecsvreader = csv.reader(file) 4. Extract the field namesCreate an empty list called header. Use the next() method to obtain the header. The .next() method returns the current row and moves to the next row. The first time you run next() it returns the header and the next time you run it returns the first record and so on. header = [] header = next(csvreader) header 5. Extract the rows/recordsCreate an empty list called rows and iterate through the csvreader object and append each row to the rows list. rows = [] for row in csvreader: rows.append(row) rows 6. Close the file.close() method is used to close the opened file. Once it is closed, we cannot perform any operations on it. file.close() Complete Code:Python Code: Naturally, we might forget to close an open file. To avoid that we can use the with() statement to automatically release the resources. In simple terms, there is no need to call the .close() method if we are using with() statement. Implementing the above code using with() statement:Syntax: with open(filename, mode) as alias_filename: Modes:
import csv rows = [] with open("Salary_Data.csv", 'r) as file: csvreader = csv.reader(file) header = next(csvreader) for row in csvreader: rows.append(row) print(header) print(rows) 2.2 Using .readlines()Now the question is – “Is it possible to fetch the header, rows using only open() and with() statements and without the csv library?” Let’s see… .readlines() method is the answer. It returns all the lines in a file as a list. Each item of the list is a row of our CSV file. The first row of the file.readlines() is the header and the rest of them are the records. with open('Salary_Data.csv') as file: content = file.readlines() header = content[:1] rows = content[1:] print(header) print(rows) **The ‘n’ from the output can be removed using .strip() method. What if we have a huge dataset with hundreds of features and thousands of records. Would it be possible to handle lists?? Here comes the pandas library into the picture. 2.3 Using pandasSteps of reading CSV files using pandas 1. Import pandas library import pandas as pd 2. Load CSV files to pandas using read_csv() Basic Syntax: pandas.read_csv(filename, delimiter=’,’) data= pd.read_csv("Salary_Data.csv") data 3. Extract the field names .columns is used to obtain the header/field names. data.columns
4. Extract the rows All the data of a data frame can be accessed using the field names. data.Salary 3. Writing to a CSV fileWe can write to a CSV file in multiple ways. 3.1 Using csv.writerLet’s assume we are recording 3 Students data(Name, M1 Score, M2 Score) header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]] Steps of writing to a CSV file: 1. Import csv library import csv 2. Define a filename and Open the file using open() 3. Create a csvwriter object using csv.writer() 4. Write the header 5. Write the rest of the data code for steps 2-5 filename = 'Students_Data.csv' with open(filename, 'w', newline="") as file: csvwriter = csv.writer(file) # 2. create a csvwriter object csvwriter.writerow(header) # 4. write the header csvwriter.writerows(data) # 5. write the rest of the data Below is how our CSV file looks. 3.2 Using .writelines()Iterate through each list and convert the list elements to a string and write to the csv file. header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]] filename = 'Student_scores.csv' with open(filename, 'w') as file: for header in header: file.write(str(header)+', ') file.write('n') for row in data: for x in row: file.write(str(x)+', ') file.write('n') 3.3. Using pandasSteps to writing to a CSV using pandas 1. Import pandas library import pandas as pd 2. Create a pandas dataframe using pd.DataFrame Syntax: pd.DataFrame(data, columns) The data parameter takes the records/observations and the columns parameter takes the columns/field names. header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]] data = pd.DataFrame(data, columns=header) 3. Write to a CSV file using to_csv() Syntax: DataFrame.to_csv(filename, sep=’,’, index=False) **separator is ‘,’ by default. index=False to remove the index numbers. data.to_csv('Stu_data.csv', index=False) Below is how our CSV looks like End Notes:Thank you for reading till the conclusion. By the end of this article, we are familiar with different ways of handling CSV files in Python. I hope this article is informative. Feel free to share it with your study buddies. References:Check out the complete code from the GitHub repo. Other Blog Posts by me Feel free to check out my other blog posts from my Analytics Vidhya Profile. You can find me on LinkedIn, Twitter in case you would want to connect. I would be glad to connect with you. For immediate exchange of thoughts, please write to me at [email protected]. The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion. How does Python read a CSV file?Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python's built-in open() function, which returns a file object. This is then passed to the reader , which does the heavy lifting.
Is CSV reader faster than pandas?Pandas' default CSV reading. The faster, more parallel CSV reader introduced in v1. 4.
...
Instead of reading in a CSV, you could read in some other file format that is faster to process.. What is the best way to view a CSV file?Opening a CSV file is simpler than you may think. In almost any text editor or spreadsheet program, just choose File > Open and select the CSV file. For most people, it is best to use a spreadsheet program. Spreadsheet programs display the data in a way that is easier to read and work with than a text editor.
What is the method to read CSV file using pandas?Pandas Read CSV. Load the CSV into a DataFrame: import pandas as pd. df = pd.read_csv('data.csv') ... . Print the DataFrame without the to_string() method: import pandas as pd. ... . Check the number of maximum returned rows: import pandas as pd. ... . Increase the maximum number of rows to display the entire DataFrame: import pandas as pd.. |