Read multiple text files into dataframe python

I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.

Requirements and Background information:

  1. Each file has one column of numbers
  2. No headers are present in the files
  3. Positive and negative integers are possible
  4. The size of all the *.txt files is the same
  5. The columns of the DataFrame must have the name of file (without extension) as the header
  6. The number of files is not known ahead of time

Here is one sample *.txt file. All the others have the same format.

16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46

Here is my attempt:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

Steps 1 and 2 work. I am having problems with step 3. I get the following error message:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in 
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?

When data wrangling with Pandas you’ll eventually work with multiple types of data sources. We already covered how to get Pandas to interact with Excel spreadsheets, sql databases, so on. In today’s tutorial, we will learn how use Pyhton3 to import text (.txt) files into a Pandas DataFrames. The process as expected is relatively simple to follow.

Example: Reading one text file to a DataFrame in Python

Suppose that you have a text file named interviews.txt, which contains tab delimited data.

We’ll go ahead and load the text file using pd.read_csv():

import pandas as pd

hr = pd.read_csv('interviews.txt', names =['month', 'first', 'second'])

hr.head()

The result will look a bit distorted as you haven’t specified the tab as your column delimiter:

Read multiple text files into dataframe python

Specifying the /t escape string as your delimiter, will fix your DataFrame data:

hr = pd.read_csv('interviews.txt', delimiter='\t', names =['month', 'first', 'second'])

hr.head()
Read multiple text files into dataframe python

Importing multiple text files to Python Pandas DataFrames

This is a more interesting case, in which you need to import several text files located in one directory in your operating system into a Pandas DataFrame. Your text files could contain data extracted from a 3rd party system, database and so forth.

Before we go on we’ll need to import a couple of Python libraries:

import os, glob

Now using the following code:

# Define relative path to folder containing the text files

files_folder = "../data/"
files = []

# Create a dataframe list by using a list comprehension

files = [pd.read_csv(file, delimiter='\t', names =['month', 'first', 'second'] ) for file in glob.glob(os.path.join(files_folder ,"*.txt"))]

# Concatenate the list of DataFrames into one
files_df = pd.concat(files)

Once you have your DataFrame populated , you can further analyze and visualize your data using Pandas.

Additional learning

  • How to write text files in Python 3?
  • How to write lists and dictionaries into a CSV file with Python 3?

In this article, we are going to see how to read multiple data files into pandas, data files are of multiple types, here are a few ways to read multiple files by using the pandas package in python.

The demonstrative files can be download from here

Method 1: Reading CSV files

If our data files are in CSV format then the read_csv() method must be used. read_csv takes a file path as an argument. it reads the content of the CSV. To read multiple CSV files we can just use a simple for loop and iterate over all the files. 

Example: Reading Multiple CSV files using Pandas

In this example we make a list of our data files or file path and then iterate through the file paths using a for loop, a for loop is used to iterate through iterables like list, tuples, strings, etc. And then create a data frame using pd.DataFrame(), concatenate each dataframe into a main dataframe using pd.concat(), then convert the final main dataframe into a CSV file using to_csv() method which takes the name of the new CSV file we want to create as an argument.

Python3

import pandas as pd

file_list=['a.csv','b.csv','c.csv']

main_dataframe = pd.DataFrame(pd.read_csv(file_list[0]))

for i in range(1,len(file_list)):

    data = pd.read_csv(file_list[i])

    df = pd.DataFrame(data)

    main_dataframe = pd.concat([main_dataframe,df],axis=1)

print(main_dataframe)

Output:

Method 2: Using the glob package

The glob module in python is used to retrieve files or pathnames matching a specified pattern. 

This program is similar to the above program but the only difference is instead of keeping track of file names using a list we use the glob package to retrieve files matching a specified pattern.

Example: Reading multiple CSV files using Pandas and glob.

Python3

import pandas as pd

import glob

folder_path = 'Path_of_file/csv_files'

file_list = glob.glob(folder_path + "/*.csv")

main_dataframe = pd.DataFrame(pd.read_csv(file_list[0]))

for i in range(1,len(file_list)):

    data = pd.read_csv(file_list[i])

    df = pd.DataFrame(data)

    main_dataframe = pd.concat([main_dataframe,df],axis=1)

print(main_dataframe)

Output:

Method 3: Reading text files using Pandas:

To read text files, the panda’s method read_table() must be used.

Example: Reading text file using pandas and glob.

Using glob package to retrieve files or pathnames and then iterate through the file paths using a for loop. Create a data frame of the contents of each file after reading it using pd.read_table() method which takes the file path as an argument. Concatenate each dataframe into a main dataframe using pd.concat(), then convert the final main dataframe into a CSV file using to_csv() method which takes the name of the new CSV file we want to create as an argument.

Python3

import pandas as pd

import glob

folder_path = 'Path_/files'

file_list = glob.glob(folder_path + "/*.txt")

main_dataframe = pd.DataFrame(pd.read_table(file_list[0]))

for i in range(1,len(file_list)):

    data = pd.read_table(file_list[i])

    df = pd.DataFrame(data)

    main_dataframe = pd.concat([main_dataframe, df], axis = 1)

print(main_dataframe)

main_dataframe.to_csv('new_csv1.csv')

Output:

Read multiple text files into dataframe python


How do I read multiple text files in Python?

Import the OS module in your notebook. Define a path where the text files are located in your system. Create a list of files and iterate over to find if they all are having the correct extension or not. Read the files using the defined function in the module.

How do I read a text file into a DataFrame in Python?

Method 2: Using read_table() We can read data from a text file using read_table() in pandas. This function reads a general delimited file to a DataFrame object. This function is essentially the same as the read_csv() function but with the delimiter = '\t', instead of a comma by default.

How do I read multiple DataFrame files in Python?

import glob..
import os..
import pandas as pd..
all_files = glob. glob("animals/*.csv").
df = pd. concat((pd. read_csv(f) for f in all_files)).
print(df).

How do I read multiple CSV files in Python?

Code explanation Here, the glob module helps extract file directory (path + file name with extension), Lines 10–13: We create a list type object dataFrames to keep every csv as a DataFrame at each index of that list. Line 15: We call pd. concat() method to merge each DataFrame in the list by columns, that is, axis=1 .