Utility to create dict representations of file system trees.
Create a dictionary representation of a filesystem hierarchy.
Optionally report some stats on the hierarchy (esp size of each directory).
Given an <inputDir>, pftree will perform a recursive walk down the directory tree. For each directory that contains files, pftree will create a dictionary key of the directory path, and will store a list of filenames for the key value.
pftree in and of itself does not really do any work. It is a class/module that abstracts the internals for representing file system hierarchies in dictionary form to be used by other modules. As a convenience, however, the --stats or --statsReverse do provide a useful analog for sorted directory usage down a file system tree.
Several simple file and directory name filters can be applied which can facilitate the targetting of very specific elements in a file system tree.
The core the of the class is a tree_analysisApply() method, that accepts various kwargs. When called, this method will loop over the dictionary, and for each key (i.e. ‘path’) will execute a callback method. This callback is passed the dictionary value at that key (i.e. usually just the list of files) as well as all the kwargs passed to tree_analysisApply().
The following dependencies are installed on your host system/python3 virtual env (they will also be automatically installed if pulled from pypi):
pfmisc (various misc modules and classes for the pf* family of objects)
tqdm (console prettiness for progress bars)
The best method of installing this script and all of its dependencies is by fetching it from PyPI
pip3 install pftree
--inputDir <inputDir> Input directory to examine. The downstream nested structure of this directory is examined and recreated in the <outputDir>. [--outputDir <outputDir>] The directory to contain a tree structure identical to the input tree structure, and which contains all output files from the per-input-dir processing. [--maxdepth <dirDepth>] The maximum depth to descend relative to the <inputDir>. Note, that this counts from zero! Default of '-1' implies transverse the entire directory tree. [--relativeDir] A flag argument. If passed (i.e. True), then the dictionary key values are taken to be relative to the <inputDir>, i.e. the key values will not contain the <inputDir>; otherwise the key values will contain the <inputDir>. [--inputFile <inputFile>] An optional <inputFile> specified relative to the <inputDir>. If specified, then do not perform a directory walk, but target this specific file. [--fileFilter <someFilter1,someFilter2,...>] An optional comma-delimated string to filter out files of interest from the <inputDir> tree. Each token in the expression is applied in turn over the space of files in a directory location according to a logical operation, and only files that contain this token string in their filename are preserved. [--filteFilterLogic AND|OR] The logical operator to apply across the fileFilter operation. Default is OR. [--dirFilter <someFilter1,someFilter2,...>] An additional filter that will further limit any files to process to only those files that exist in leaf directory nodes that have some substring of each of the comma separated <someFilter> in their directory name. [--dirFilterLogic AND|OR] The logical operator to apply across the dirFilter operation. Default is OR. [--outputLeafDir <outputLeafDirFormat>] If specified, will apply the <outputLeafDirFormat> to the output directories containing data. This is useful to blanket describe final output directories with some descriptive text, such as 'anon' or 'preview'. This is a formatting spec, so --outputLeafDir 'preview-%%s' where %%s is the original leaf directory node, will prefix each final directory containing output with the text 'preview-' which can be useful in describing some features of the output set. [--threads <numThreads>] If specified, break the innermost analysis loop into <numThreads> threads. Please note the following caveats: * Only thread if you have a high CPU analysis loop. Note that the input file read and output file write loops are not threaded -- only the analysis loop is threaded. Thus, if the bulk of execution time is in file IO, threading will not really help. * Threading will change the nature of the innermost looping across the problem domain, with the result that *all* of the problem data will be read into memory! That means potentially all the target input file data across the entire input directory tree. [--json] If specified, do a JSON dump of the entire return payload. [--followLinks] If specified, follow symbolic links. [--overwrite] If specified, allow for overwriting of existing files [--man] Show full help. [--synopsis] Show brief help. [--verbosity <level>] Set the app verbosity level. This ranges from 0...<N> where internal log messages with a level=<M> will only display if M <= N. In this manner increasing the level here can be used to show more and more debugging info, assuming that debug messages in the code have been tagged with a level. [--stats | --statsReverse | --du | --duf] If specified, return some stats to caller. The amount of information returned depends on the --verbosity. For --stats (and --statsReverse): * --verbosity 0: return only a final summary of group statistics * --verbosity 1: in addition, return a sorted (by size) list of subdirectories in the search tree * --verbosity >1: same as above, but provide probing status updates. NOTE: this incurs a significant performance penalty! For --du | --duf similar to '--stats' but return directory lists in a fashion similar to the GNU 'du' tool. Both of these set default verbosity values so that * --du : only provide a summary * --duf: provide the (full) sorted list as well [--3D] A "toy" flag that simply shows the final stats report with an ASCII 3D effect. [--jsonStats] If specified, do a JSON dump of the stats. [--syslog] If specified, prepend output 'log' messages in syslog style. [--test <analysisDelayLength[:<type>]>] If specified, perform a test/dummy run through the - read - analyze - write callbacks. The <analysisDelayLength> denotes time (in seconds) to delay in the analysis loop -- useful for testing threading performance. An optional [:<type>] can be specified. :0 - write the 'l_file' to each outputdir, i.e. a simple 'ls' analog :1 - write only the number of files analyzed to each outputdir, i.e. a summary. For large trees, ':0' can take a significantly longer time than ':1'.
Run on a target tree and output some detail and stats
pftree --inputDir /var/www/html \ --printElapsedTime \ --stats --verbosity 0
Increasing the verbosity will produce increasing output on the console. Passing a --json will return a highly detailed JSON payload with considerable information. Passing a --jsonStats will only return a summary of the final stats on the filesystem probed. Note that the --verbosity flag is ignored if --json or --jsonStats are also present.
pftree --duf --inputDir /var/www/html
Simpler CLI for a less “progress displaying” but faster response.
Run a test down a target tree:
pftree --inputDir /etc \ --outputDir /tmp/test \ --verbosity 1 --relativeDir \ --outputLeafDir 'preview-%%s' \ --test 0
which will “copy” the input tree to the output, and save a file-ls.txt in each directory where necessary. Note the -r for ‘relative’ directory specification and the --outputLeafDir spec.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.