Pretty simple solution would be to run a couple of sub process calls to export the files into CSV format:
import subprocess
# Global variables for directory being mapped
location = '.' # Enter the path here.
pattern = '*.py' # Use this if you want to only return certain filetypes
rootDir = location.rpartition['/'][-1]
outputFile = rootDir + '_directory_contents.csv'
# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = 'find ' + location + ' -name ' + pattern + ' -fprintf ' + outputFile + ' "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call[find_cmd, shell=True]
That command produces comma separated values that can be easily analyzed in Excel.
f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py
The resulting CSV file doesn't have a header row, but you can use a second command to add them.
# Add headers to the CSV
headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call[headers_cmd, shell=True]
Depending on how much data you get back, you can massage it further using Pandas. Here are some things I found useful, especially if you're dealing with many levels of directories to look through.
Add these to your imports:
import numpy as np
import pandas as pd
Then add this to your code:
# Create DataFrame from the csv file created above.
df = pd.read_csv[outputFile]
# Format columns
# Get the filename and file extension from the filepath
df['FileName'] = df['FilePath'].str.rsplit["/",1].str[-1]
df['FileExt'] = df['FileName'].str.rsplit['.',1].str[1]
# Get the full path to the files. If the path doesn't include a "/" it's the root directory
df['FullPath'] = df["FilePath"].str.rsplit["/",1].str[0]
df['FullPath'] = np.where[df['FullPath'].str.contains["/"], df['FullPath'], rootDir]
# Split the path into columns for the parent directory and its children
df['ParentDir'] = df['FullPath'].str.split["/",1].str[0]
df['SubDirs'] = df['FullPath'].str.split["/",1].str[1]
# Account for NaN returns, indicates the path is the root directory
df['SubDirs'] = np.where[df.SubDirs.str.contains['NaN'], '', df.SubDirs]
# Determine if the item is a directory or file.
df['Type'] = np.where[df['Permissions'].str.startswith['d'], 'Dir', 'File']
# Split the time stamp into date and time columns
df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit['+', 1, expand=True]
df['Time'] = df['Time'].str.split['.'].str[0]
# Show only files, output includes paths so you don't necessarily need to display the individual directories.
df = df[df['Type'].str.contains['File']]
# Set columns to show and their order.
df=df[['FileName','ParentDir','SubDirs','FullPath','DocType','ModifiedDate','Time', 'Size']]
filesize=[] # Create an empty list to store file sizes to convert them to something more readable.
# Go through the items and convert the filesize from bytes to something more readable.
for items in df['Size'].items[]:
filesize.append[convert_bytes[items[1]]]
df['Size'] = filesize
# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter["scripts_directory_contents.xlsx"] as writer:
for directory, data in df.groupby['ParentDir']:
data.to_excel[writer, sheet_name = directory, index=False]
# To convert sizes to be more human readable
def convert_bytes[size]:
for x in ['b', 'K', 'M', 'G', 'T']:
if size < 1024:
return "%3.1f %s" % [size, x]
size /= 1024
return size
In this article we will discuss different methods to generate a list of all files in directory tree. Python’s os module provides a function to get the list of files or folder in a directory i.e. It returns a list of all the files and sub
directories in the given path.Creating a list of files in directory and sub directories using os.listdir[]
os.listdir[path='.']
We need to call this recursively for sub directories to create a complete list of files in given directory tree i.e.
''' For the given path, get the List of all files in the directory tree ''' def getListOfFiles[dirName]: # create a list of file and sub directories # names in the given directory listOfFile = os.listdir[dirName] allFiles = list[] # Iterate over all the entries for entry in listOfFile: # Create full path fullPath = os.path.join[dirName, entry] # If entry is a directory then get the list of files in this directory if os.path.isdir[fullPath]: allFiles = allFiles + getListOfFiles[fullPath] else: allFiles.append[fullPath] return allFiles
Call the above function to create a list of files in a directory tree i.e.
dirName = '/home/varun/Downloads'; # Get the list of all files in directory tree at given path listOfFiles = getListOfFiles[dirName]
Creating a list of files in directory and sub directories using os.walk[]
Python’s os module provides a function to iterate over a directory tree i.e.
os.walk[path]
It iterates of the directory tree at give path and for each
directory or sub directory it returns a tuple containing,
[ , , .
Iterate over the directory tree and generate a list of all the files at given path,
# Get the list of all files in directory tree at given path listOfFiles = list[] for [dirpath, dirnames, filenames] in os.walk[dirName]: listOfFiles += [os.path.join[dirpath, file] for file in filenames]
Complete example is as follows,
import os ''' For the given path, get the List of all files in the directory tree ''' def getListOfFiles[dirName]: # create a list of file and sub directories # names in the given directory listOfFile = os.listdir[dirName] allFiles = list[] # Iterate over all the entries for entry in listOfFile: # Create full path fullPath = os.path.join[dirName, entry] # If entry is a directory then get the list of files in this directory if os.path.isdir[fullPath]: allFiles = allFiles + getListOfFiles[fullPath] else: allFiles.append[fullPath] return allFiles def main[]: dirName = '/home/varun/Downloads'; # Get the list of all files in directory tree at given path listOfFiles = getListOfFiles[dirName] # Print the files for elem in listOfFiles: print[elem] print ["****************"] # Get the list of all files in directory tree at given path listOfFiles = list[] for [dirpath, dirnames, filenames] in os.walk[dirName]: listOfFiles += [os.path.join[dirpath, file] for file in filenames] # Print the files for elem in listOfFiles: print[elem] if __name__ == '__main__': main[]
Output:
/home/varun/Downloads/temp1.txt /home/varun/Downloads/sample/temp2.txt /home/varun/Downloads/test/message.txt
Advertisements