Skip to main content

Flexible recursive directory iterator: scandir meets glob("**", recursive=True)

Project description

Build Status codecov

scantree

Recursive directory iterator supporting:

  • flexible filtering including wildcard path matching
  • in memory representation of file-tree (for repeated access)
  • efficient access to directory entry properties (posix.DirEntry interface) extended with real path and path relative to the recursion root directory
  • detection and handling of cyclic symlinks

Installation

pip install scantree

Usage

See source code for full documentation, some generic examples below.

Get matching file paths:

from scantree import scantree, RecursionFilter

tree = scantree('/path/to/dir', RecursionFilter(match=['*.txt']))
print([path.relative for path in tree.filepaths()])
print([path.real for path in tree.filepaths()])
['d1/d2/file3.txt', 'd1/file2.txt', 'file1.txt']   
['/path/to/other_dir/file3.txt', '/path/to/dir/d1/file2.txt', '/path/to/dir/file1.txt']   

Access metadata of directory entries in file tree:

d2 = tree.directories[0].directories[0]
print(type(d2))
print(d2.path.absolute)
print(d2.path.real)
print(d2.path.is_symlink())
print(d2.files[0].relative)
scantree._node.DirNode
/path/to/dir/d1/d2
/path/to/other_dir
True
d1/d2/file3.txt

Aggregate information by operating on tree:

hello_count = tree.apply(
    file_apply=lambda path: sum([
        w.lower() == 'hello' for w in
        path.as_pathlib().read_text().split()
    ]),
    dir_apply=lambda dir_: sum(dir_.entries),
)
print(hello_count)
3
hello_count_tree =  tree.apply(
    file_apply=lambda path: {
        'name': path.name,
        'count': sum([
            w.lower() == 'hello'
            for w in path.as_pathlib().read_text().split()
        ])
    },
    dir_apply=lambda dir_: {
        'name': dir_.path.name,
        'count': sum(e['count'] for e in dir_.entries),
        'sub_counts': [e for e in dir_.entries]
    },
)
from pprint import pprint
pprint(hello_count_tree)
{'count': 3,
 'name': 'dir',
 'sub_counts': [{'count': 2, 'name': 'file1.txt'},
                {'count': 1,
                 'name': 'd1',
                 'sub_counts': [{'count': 1, 'name': 'file2.txt'},
                                {'count': 0,
                                 'name': 'd2',
                                 'sub_counts': [{'count': 0,
                                                 'name': 'file3.txt'}]}]}]}

Flexible filtering:

without_hidden_files = scantree('.', RecursionFilter(match=['*', '!.*']))

without_palindrome_linked_dirs = scantree(
    '.',
    lambda paths: [
        p for p in paths if not (
            p.is_dir() and 
            p.is_symlink() and 
            p.name == p.name[::-1]
        )
    ]
)

Comparison:

tree = scandir('path/to/dir')
# make some operations on filesystem, make sure file tree is the same:
assert tree == scandir('path/to/dir')

# tree contains absolute/real path info:
import shutil
shutil.copytree('path/to/dir', 'path/to/other_dir')   
new_tree = scandir('path/to/other_dir')
assert tree != new_tree
assert (
    [p.relative for p in tree.leafpaths()] == 
    [p.relative for p in new_tree.leafpaths()]
)

Inspect symlinks:

from scantree import CyclicLinkedDir

file_links = []
dir_links = []
cyclic_links = []

def file_apply(path):
    if path.is_symlink():
        file_links.append(path)

def dir_apply(dir_node):
    if dir_node.path.is_symlink():
        dir_links.append(dir_node.path)
    if isinstance(dir_node, CyclicLinkedDir):
        cyclic_links.append((dir_node.path, dir_node.target_path))

scantree('.', file_apply=file_apply, dir_apply=dir_apply)

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scantree, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size scantree-0.0.1.tar.gz (13.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page