How do i get a list of files in a directory and subfolders in python?

Pretty simple solution would be to run a couple of sub process calls to export the files into CSV format:

import subprocess

# Global variables for directory being mapped

location = '.' # Enter the path here.
pattern = '*.py' # Use this if you want to only return certain filetypes
rootDir = location.rpartition('/')[-1]
outputFile = rootDir + '_directory_contents.csv'

# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = 'find ' + location + ' -name ' + pattern +  ' -fprintf ' + outputFile + '  "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call(find_cmd, shell=True)

That command produces comma separated values that can be easily analyzed in Excel.

f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py

The resulting CSV file doesn't have a header row, but you can use a second command to add them.

# Add headers to the CSV
headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call(headers_cmd, shell=True)

Depending on how much data you get back, you can massage it further using Pandas. Here are some things I found useful, especially if you're dealing with many levels of directories to look through.

Add these to your imports:

import numpy as np
import pandas as pd

Then add this to your code:

# Create DataFrame from the csv file created above.
df = pd.read_csv(outputFile)
    
# Format columns
# Get the filename and file extension from the filepath 
df['FileName'] = df['FilePath'].str.rsplit("/",1).str[-1]
df['FileExt'] = df['FileName'].str.rsplit('.',1).str[1]

# Get the full path to the files. If the path doesn't include a "/" it's the root directory
df['FullPath'] = df["FilePath"].str.rsplit("/",1).str[0]
df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'], rootDir)

# Split the path into columns for the parent directory and its children
df['ParentDir'] = df['FullPath'].str.split("/",1).str[0]
df['SubDirs'] = df['FullPath'].str.split("/",1).str[1]
# Account for NaN returns, indicates the path is the root directory
df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)

# Determine if the item is a directory or file.
df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')

# Split the time stamp into date and time columns
df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
df['Time'] = df['Time'].str.split('.').str[0]

# Show only files, output includes paths so you don't necessarily need to display the individual directories.
df = df[df['Type'].str.contains('File')]

# Set columns to show and their order.
df=df[['FileName','ParentDir','SubDirs','FullPath','DocType','ModifiedDate','Time', 'Size']]

filesize=[] # Create an empty list to store file sizes to convert them to something more readable.

# Go through the items and convert the filesize from bytes to something more readable.
for items in df['Size'].items():
    filesize.append(convert_bytes(items[1]))
    df['Size'] = filesize 

# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
    for directory, data in df.groupby('ParentDir'):
    data.to_excel(writer, sheet_name = directory, index=False) 
        

# To convert sizes to be more human readable
def convert_bytes(size):
    for x in ['b', 'K', 'M', 'G', 'T']:
        if size < 1024:
            return "%3.1f %s" % (size, x)
        size /= 1024

    return size

In this article we will discuss different methods to generate a list of all files in directory tree.

Creating a list of files in directory and sub directories using os.listdir()

Python’s os module provides a function to get the list of files or folder in a directory i.e.

os.listdir(path='.')

It returns a list of all the files and sub directories in the given path.

We need to call this recursively for sub directories to create a complete list of files in given directory tree i.e.

'''
    For the given path, get the List of all files in the directory tree 
'''
def getListOfFiles(dirName):
    # create a list of file and sub directories 
    # names in the given directory 
    listOfFile = os.listdir(dirName)
    allFiles = list()
    # Iterate over all the entries
    for entry in listOfFile:
        # Create full path
        fullPath = os.path.join(dirName, entry)
        # If entry is a directory then get the list of files in this directory 
        if os.path.isdir(fullPath):
            allFiles = allFiles + getListOfFiles(fullPath)
        else:
            allFiles.append(fullPath)
                
    return allFiles

Call the above function to create a list of files in a directory tree i.e.

dirName = '/home/varun/Downloads';

# Get the list of all files in directory tree at given path
listOfFiles = getListOfFiles(dirName)

Creating a list of files in directory and sub directories using os.walk()

Python’s os module provides a function to iterate over a directory tree i.e.

os.walk(path)

It iterates of the directory tree at give path and for each directory or sub directory it returns a tuple containing,
(

, , .
Iterate over the directory tree and generate a list of all the files at given path,

# Get the list of all files in directory tree at given path
listOfFiles = list()
for (dirpath, dirnames, filenames) in os.walk(dirName):
    listOfFiles += [os.path.join(dirpath, file) for file in filenames]

Complete example is as follows,

import os

'''
    For the given path, get the List of all files in the directory tree 
'''
def getListOfFiles(dirName):
    # create a list of file and sub directories 
    # names in the given directory 
    listOfFile = os.listdir(dirName)
    allFiles = list()
    # Iterate over all the entries
    for entry in listOfFile:
        # Create full path
        fullPath = os.path.join(dirName, entry)
        # If entry is a directory then get the list of files in this directory 
        if os.path.isdir(fullPath):
            allFiles = allFiles + getListOfFiles(fullPath)
        else:
            allFiles.append(fullPath)
                
    return allFiles        


def main():
    
    dirName = '/home/varun/Downloads';
    
    # Get the list of all files in directory tree at given path
    listOfFiles = getListOfFiles(dirName)
    
    # Print the files
    for elem in listOfFiles:
        print(elem)

    print ("****************")
    
    # Get the list of all files in directory tree at given path
    listOfFiles = list()
    for (dirpath, dirnames, filenames) in os.walk(dirName):
        listOfFiles += [os.path.join(dirpath, file) for file in filenames]
        
        
    # Print the files    
    for elem in listOfFiles:
        print(elem)    
        
        
        
        
if __name__ == '__main__':
    main()

Output:

/home/varun/Downloads/temp1.txt
/home/varun/Downloads/sample/temp2.txt
/home/varun/Downloads/test/message.txt
 

Advertisements

How do I get a list of files in a directory and subdirectories?

Here are the steps to get a list of all the file names from a folder:.
Go to the Data tab..
In the Get & Transform group, click on New Query..
Hover the cursor on the 'From File' option and click on 'From Folder'..
In the Folder dialog box, enter the folder path, or use the browse button to locate it..
Click OK..

How do you read all files in a directory and subfolders in Python?

os. listdir() method in python is used to get the list of all files and directories in the specified directory.

How do you get a list of all files in a directory in Python?

To get a list of all the files and folders in a particular directory in the filesystem, use os. listdir() in legacy versions of Python or os. scandir() in Python 3.