Skip to main content

Another library for iterating through the contents of a directory

Project description

Downloads Downloads Coverage Status Lines of code Hits-of-Code Test-Package Python versions PyPI version Checked with mypy Ruff DeepWiki

logo

There are many libraries for traversing directories. You can also do this using the standard library. What makes this library different:

  • ⚗️ Filtering by file extensions, text patterns in .gitignore format, and using custom callables.
  • 🐍 Natively works with both Path objects from the standard library and strings.
  • ❌ Support for cancellation tokens.
  • 👯‍♂️ Combining multiple crawling methods in one object.

Table of contents

Installation

You can install dirstree with pip:

pip install dirstree

You can also use instld to quickly try out this package and others without installing them.

Basic usage

The library is easy to use:

  • Create a crawler object, passing the path to the base directory and, if necessary, additional arguments.
  • Iterate through it.

The simplest example would look like this:

from dirstree import Crawler

crawler = Crawler('.')

for file in crawler:
    print(file)

↑ This recursively prints all files in the current directory, including files in nested directories. At each iteration, we get a new Path object.

Filtering

Iterating through the files in the directory, you may not want to view all files, but only files of a certain type. To do this, ignore all other files. How to do it? There are three ways:

  • Bypass only files with the specified extensions, such as .txt, .doc, or .py.
  • Bypass files whose paths follow a specific text pattern.
  • Use an arbitrary function to determine whether you need each specific path or not.

To select a specific method, you need to pass a specific parameter when creating the crawler object. Of course, all the methods can be combined with each other.

To set the file extensions you are interested in, use the extensions parameter:

crawler = Crawler('.', extensions=['.txt'])  # Iterate only on .txt files.

Also, if you only need Python files, you can use a special class to bypass them only, without specifying extensions:

from dirstree import PythonCrawler

crawler = PythonCrawler('.')  # Iterate only on .py files.

To specify which files and directories you do NOT want to iterate over, use the exclude parameter:

crawler = Crawler('.', exclude=['.git', 'venv'])  # Exclude ".git" and "venv" directories.

↑ Please note that we use the .gitignore format here.

If you need a universal way to filter out unnecessary paths, pass your function as the filter parameter:

crawler = Crawler('.', filter=lambda path: len(str(path)) == 7)  # Iterate only on paths that are 7 characters long.

Working with Cancellation Tokens

You can set an arbitrary condition under which file traversal will stop using cancellation tokens from the cantok library.

There are two ways to do this ↓

  1. If you use the crawler as a one-time object for a single iteration, set the token when creating it:
for path in Crawler('.', token=TimeoutToken(0.0001)): # Limit the iteration time to 0.0001 seconds.
  print(path)
  1. If you plan to use the crawler object several times, use the go() method for iteration and pass a new token to it every time:
crawler = Crawler('.')

for path in crawler.go(token=TimeoutToken(0.0001)): # Limit the iteration time to 0.0001 seconds.
  print(path)

↑ Follow these rules to avoid accidentally "baking" an expired token inside a crawler object.

Combination

You can combine multiple crawler objects into one using the usual addition operator, like this:

for path in Crawler('../dirstree') + Crawler('../cantok'):
    print(path)

↑ The paths that you will iterate over will be automatically deduplicated.

↑ You can also impose arbitrary restrictions on each of the summed objects, all of them will be taken into account.

You can also pass multiple paths to a single crawler object:

for path in Crawler('../dirstree', '../cantok'):
    print(path)

↑ In this case, there is no deduplication of paths.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dirstree-0.0.6.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dirstree-0.0.6-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file dirstree-0.0.6.tar.gz.

File metadata

  • Download URL: dirstree-0.0.6.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirstree-0.0.6.tar.gz
Algorithm Hash digest
SHA256 9f3bf101fd800d2008b99b8433f3fd65e354140f3312952af6f033b258fea4e8
MD5 10b3c1690f29762e7dd0ebd062076ca4
BLAKE2b-256 87888456c8b2369c9837d295e1a5d4f4f1c729be2c8ed9f8aa676f7fa3b7ce1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirstree-0.0.6.tar.gz:

Publisher: release.yml on mutating/dirstree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dirstree-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: dirstree-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dirstree-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3483d92a3e5de488896866cc4488bc11a63440a99a6fa334d8bbdeb81483d4af
MD5 b8ee89c863e34bb749633b8c1fcbd40b
BLAKE2b-256 5b3ad25d46e38794dff54b18699e5b57bf4b7cf7f21d8d0df9b350b663bc4a26

See more details on using hashes here.

Provenance

The following attestation bundles were made for dirstree-0.0.6-py3-none-any.whl:

Publisher: release.yml on mutating/dirstree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page