I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.
Requirements and Background information:
- Each file has one column of numbers
- No headers are present in the files
- Positive and negative integers are possible
- The size of all the *.txt files is the same
- The columns of the DataFrame must have the name of file [without extension] as the header
- The number of files is not known ahead of time
Here is one sample *.txt file. All the others have the same format.
16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46
Here is my attempt:
import pandas as pd
import os
import glob
# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir[ my_dir ]
# Step 2: Build up list of files:
for files in glob.glob["*.txt"]:
fileName, fileExtension = os.path.splitext[files]
filelist.append[fileName] #filename without extension
filesList.append[files] #filename with extension
# Step 3: Build up DataFrame:
df = pd.DataFrame[]
for ijk in filelist:
frame = pd.read_csv[filesList[ijk]]
df = df.append[frame]
print df
Steps 1 and 2 work. I am having problems with step 3. I get the following error message:
Traceback [most recent call last]:
File "C:\Python27\TextFile.py", line 26, in
frame = pd.read_csv[filesList[ijk]]
TypeError: list indices must be integers, not str
Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?
When data wrangling with Pandas you’ll eventually work with multiple types of data sources. We already covered how to get Pandas to interact with Excel spreadsheets,
sql databases, so on. In today’s tutorial, we will learn how use Pyhton3 to import text [.txt] files into a Pandas DataFrames. The process as expected is relatively simple to follow. Suppose that you have a text file named interviews.txt,
which contains tab delimited data. We’ll go ahead and load the text file using pd.read_csv[]: The result will look a bit distorted as you haven’t specified the tab as your column delimiter: Specifying the /t escape string as your delimiter, will fix your DataFrame data: This is a more interesting case, in which you need to import several text files located in one
directory in your operating system into a Pandas DataFrame. Your text files could contain data extracted from a 3rd party system, database and so forth. Before we go on we’ll need to import a couple of Python libraries: Now using the following code: Once you have your DataFrame populated , you can further analyze and visualize your data using Pandas.Example: Reading one text file to a DataFrame in Python
import pandas as pd
hr = pd.read_csv['interviews.txt', names =['month', 'first', 'second']]
hr.head[]
hr = pd.read_csv['interviews.txt', delimiter='\t', names =['month', 'first', 'second']]
hr.head[]
Importing multiple text files to Python Pandas DataFrames
import os, glob
# Define relative path to folder containing the text files
files_folder = "../data/"
files = []
# Create a dataframe list by using a list comprehension
files = [pd.read_csv[file, delimiter='\t', names =['month', 'first', 'second'] ] for file in glob.glob[os.path.join[files_folder ,"*.txt"]]]
# Concatenate the list of DataFrames into one
files_df = pd.concat[files]
Additional learning
In this article, we are going to see how to read multiple data files into pandas, data files are of multiple types, here are a few ways to read multiple files by using the pandas package in python.
The demonstrative files can be download from here
Method 1: Reading CSV files
If our data files are in CSV format then the read_csv[] method must be used. read_csv takes a file path as an argument. it reads the content of the CSV. To read multiple CSV files we can just use a simple for loop and iterate over all the files.
Example: Reading Multiple CSV files using Pandas
In this example we make a list of our data files or file path and then iterate through the file paths using a for loop, a for loop is used to iterate through iterables like list, tuples, strings, etc. And then create a data frame using pd.DataFrame[], concatenate each dataframe into a main dataframe using pd.concat[], then convert the final main dataframe into a CSV file using to_csv[] method which takes the name of the new CSV file we want to create as an argument.
Python3
import
pandas as pd
file_list
=
[
'a.csv'
,
'b.csv'
,
'c.csv'
]
main_dataframe
=
pd.DataFrame[pd.read_csv[file_list[
0
]]]
for
i
in
range
[
1
,
len
[file_list]]:
data
=
pd.read_csv[file_list[i]]
df
=
pd.DataFrame[data]
main_dataframe
=
pd.concat[[main_dataframe,df],axis
=
1
]
print
[main_dataframe]
Output:
Method 2: Using the glob package
The glob module in python is used to retrieve files or pathnames matching a specified pattern.
This program is similar to the above program but the only difference is instead of keeping track of file names using a list we use the glob package to retrieve files matching a specified pattern.
Example: Reading multiple CSV files using Pandas and glob.
Python3
import
pandas as pd
import
glob
folder_path
=
'Path_of_file/csv_files'
file_list
=
glob.glob[folder_path
+
"/*.csv"
]
main_dataframe
=
pd.DataFrame[pd.read_csv[file_list[
0
]]]
for
i
in
range
[
1
,
len
[file_list]]:
data
=
pd.read_csv[file_list[i]]
df
=
pd.DataFrame[data]
main_dataframe
=
pd.concat[[main_dataframe,df],axis
=
1
]
print
[main_dataframe]
Output:
Method 3: Reading text files using Pandas:
To read text files, the panda’s method read_table[] must be used.
Example: Reading text file using pandas and glob.
Using glob package to retrieve files or pathnames and then iterate through the file paths using a for loop. Create a data frame of the contents of each file after reading it using pd.read_table[] method which takes the file path as an argument. Concatenate each dataframe into a main dataframe using pd.concat[], then convert the final main dataframe into a CSV file using to_csv[] method which takes the name of the new CSV file we want to create as an argument.
Python3
import
pandas as pd
import
glob
folder_path
=
'Path_/files'
file_list
=
glob.glob[folder_path
+
"/*.txt"
]
main_dataframe
=
pd.DataFrame[pd.read_table[file_list[
0
]]]
for
i
in
range
[
1
,
len
[file_list]]:
data
=
pd.read_table[file_list[i]]
df
=
pd.DataFrame[data]
main_dataframe
=
pd.concat[[main_dataframe, df], axis
=
1
]
print
[main_dataframe]
main_dataframe.to_csv[
'new_csv1.csv'
]
Output: