Skip to main content

Flexible recursive directory iterator: scandir meets glob("**", recursive=True)

Project description

codecov

scantree

Recursive directory iterator supporting:

  • flexible filtering including wildcard path matching
  • in memory representation of file-tree (for repeated access)
  • efficient access to directory entry properties (os.DirEntry interface) extended with real path and path relative to the recursion root directory
  • detection and handling of cyclic symlinks

Installation

pip install scantree

Usage

See source code for full documentation, some generic examples below.

Get matching file paths:

from scantree import scantree, RecursionFilter

tree = scantree('/path/to/dir', RecursionFilter(match=['*.txt']))
print([path.relative for path in tree.filepaths()])
print([path.real for path in tree.filepaths()])
['d1/d2/file3.txt', 'd1/file2.txt', 'file1.txt']
['/path/to/other_dir/file3.txt', '/path/to/dir/d1/file2.txt', '/path/to/dir/file1.txt']

Access metadata of directory entries in file tree:

d2 = tree.directories[0].directories[0]
print(type(d2))
print(d2.path.absolute)
print(d2.path.real)
print(d2.path.is_symlink())
print(d2.files[0].relative)
scantree._node.DirNode
/path/to/dir/d1/d2
/path/to/other_dir
True
d1/d2/file3.txt

Aggregate information by operating on tree:

hello_count = tree.apply(
    file_apply=lambda path: sum([
        w.lower() == 'hello' for w in
        path.as_pathlib().read_text().split()
    ]),
    dir_apply=lambda dir_: sum(dir_.entries),
)
print(hello_count)
3
hello_count_tree =  tree.apply(
    file_apply=lambda path: {
        'name': path.name,
        'count': sum([
            w.lower() == 'hello'
            for w in path.as_pathlib().read_text().split()
        ])
    },
    dir_apply=lambda dir_: {
        'name': dir_.path.name,
        'count': sum(e['count'] for e in dir_.entries),
        'sub_counts': [e for e in dir_.entries]
    },
)
from pprint import pprint
pprint(hello_count_tree)
{'count': 3,
 'name': 'dir',
 'sub_counts': [{'count': 2, 'name': 'file1.txt'},
                {'count': 1,
                 'name': 'd1',
                 'sub_counts': [{'count': 1, 'name': 'file2.txt'},
                                {'count': 0,
                                 'name': 'd2',
                                 'sub_counts': [{'count': 0,
                                                 'name': 'file3.txt'}]}]}]}

Flexible filtering:

without_hidden_files = scantree('.', RecursionFilter(match=['*', '!.*']))

without_palindrome_linked_dirs = scantree(
    '.',
    lambda paths: [
        p for p in paths if not (
            p.is_dir() and
            p.is_symlink() and
            p.name == p.name[::-1]
        )
    ]
)

Comparison:

tree = scandir('path/to/dir')
# make some operations on filesystem, make sure file tree is the same:
assert tree == scandir('path/to/dir')

# tree contains absolute/real path info:
import shutil
shutil.copytree('path/to/dir', 'path/to/other_dir')
new_tree = scandir('path/to/other_dir')
assert tree != new_tree
assert (
    [p.relative for p in tree.leafpaths()] ==
    [p.relative for p in new_tree.leafpaths()]
)

Inspect symlinks:

from scantree import CyclicLinkedDir

file_links = []
dir_links = []
cyclic_links = []

def file_apply(path):
    if path.is_symlink():
        file_links.append(path)

def dir_apply(dir_node):
    if dir_node.path.is_symlink():
        dir_links.append(dir_node.path)
    if isinstance(dir_node, CyclicLinkedDir):
        cyclic_links.append((dir_node.path, dir_node.target_path))

scantree('.', file_apply=file_apply, dir_apply=dir_apply)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scantree-0.0.4.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scantree-0.0.4-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file scantree-0.0.4.tar.gz.

File metadata

  • Download URL: scantree-0.0.4.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for scantree-0.0.4.tar.gz
Algorithm Hash digest
SHA256 15bd5cb24483b04db2c70653604e8ea3522e98087db7e38ab8482f053984c0ac
MD5 ab01b6a5f7fa8d372e55315f6a5e3973
BLAKE2b-256 b3e440998faefc72ba1ddeb640a44fba92935353525dba110488806da8339c0b

See more details on using hashes here.

File details

Details for the file scantree-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: scantree-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for scantree-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7616ab65aa6b7f16fcf8e6fa1d9afaa99a27ab72bba05c61b691853b96763174
MD5 e2911ec7e92aa6b274eff65c0417c8ee
BLAKE2b-256 93ce828467ddfa0d2fe473673026442d2032d552a168e42cfbf25fd0e5264e0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page