Another library for iterating through the contents of a directory
Project description
There are many libraries for traversing directories. You can also do this using the standard library. This particular library is a bit different in that:
- ⚗️ Filtering by file extensions, text patterns in
.gitignoreformat, and using custom callables. - 🐍 Natively works with both
Pathobjects from the standard library and strings. - ❌ Support for cancellation tokens.
- 👯♂️ Combining multiple crawling methods in one object.
Table of contents
Installation
You can install dirstree using pip:
pip install dirstree
You can also quickly try out this and other packages without having to install using instld.
Basic usage
It's very easy to work with the library in your own code:
- Create a crawler object, passing the path to the base directory and, if necessary, additional arguments.
- Iterate through it.
The simplest code example would look like this:
from dirstree import Crawler
crawler = Crawler('.')
for file in crawler:
print(file)
↑ Here we output recursively (that is, including the contents of nested directories) all files from the current directory. At each iteration, we get a new
Pathobject.
Filtering
Iterating through the files in the directory, you may not want to view all files, but only files of a certain type. To do this, ignore all other files. How to do it? There are 3 ways:
- Bypass only files with the specified extensions, such as
.txt,.doc, or.py. - Bypass files whose paths follow a specific text pattern.
- Use an arbitrary function to determine whether you need each specific path or not.
To select a specific method, you need to pass a specific parameter when creating the crawler object. Of course, all the methods can be combined with each other.
To set the file extensions you are interested in, use the extensions parameter:
crawler = Crawler('.', extensions=['.txt']) # Iterate only on .txt files.
Also, if you only need Python files, you can use a special class to bypass them only, without specifying extensions:
from dirstree import PythonCrawler
crawler = PythonCrawler('.') # Iterate only on .py files.
To specify which files and directories you do NOT want to iterate over, use the exclude parameter:
crawler = Crawler('.', exclude=['.git', 'venv']) # Exclude ".git" and "venv" directories.
↑ Please note that we use the
.gitignoreformat here.
If you need a universal way to filter out unnecessary paths, pass your function as the filter parameter:
crawler = Crawler('.', filter = lambda path: len(str(path)) == 7) # Iterate only on paths that are 7 characters long.
Working with Cancellation Tokens
You can set an arbitrary condition under which file traversal will stop using cancellation tokens from the cantok library.
There are 2 ways to do this ↓
- If you use the crawler as a one-time object for a single iteration, set the token when creating it:
for path in Crawler('.', token=TimeoutToken(0.0001)): # Limit the iteration time to 0.0001 seconds.
print(path)
- If you plan to use the crawler object several times, use the
go()method for iteration and pass a new token to it everytime:
crawler = Crawler('.')
for path in crawler.go(token=TimeoutToken(0.0001)): # Limit the iteration time to 0.0001 seconds.
print(path)
↑ Follow these rules to avoid accidentally "baking" an expired token inside a crawler object.
Combination
You can combine multiple crawler objects into one using the usual addition operator, like this:
for path in Crawler('../dirstree') + Crawler('../cantok'):
print(path)
↑ The paths that you will iterate on will be automatically deduplicated.
↑ You can also impose arbitrary restrictions on each of the summed objects, all of them will be taken into account.
You can also pass multiple paths to a single crawler object:
for path in Crawler('../dirstree', '../cantok'):
print(path)
↑ In this case, there is no deduplication of paths.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dirstree-0.0.4.tar.gz.
File metadata
- Download URL: dirstree-0.0.4.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3814ddba50194b96f439e8fe3e33b72aaf1192eae1896f75718275503677654e
|
|
| MD5 |
41bf34d76d988317f5b8fa4497e4ebef
|
|
| BLAKE2b-256 |
5f7c6c9d30578daeec73b58393b5e3b605fa9771b8badccf4f16579d631b7de0
|
Provenance
The following attestation bundles were made for dirstree-0.0.4.tar.gz:
Publisher:
release.yml on mutating/dirstree
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirstree-0.0.4.tar.gz -
Subject digest:
3814ddba50194b96f439e8fe3e33b72aaf1192eae1896f75718275503677654e - Sigstore transparency entry: 946527990
- Sigstore integration time:
-
Permalink:
mutating/dirstree@a0e8442d5527267ee7815a37932099dc897995ea -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mutating
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a0e8442d5527267ee7815a37932099dc897995ea -
Trigger Event:
push
-
Statement type:
File details
Details for the file dirstree-0.0.4-py3-none-any.whl.
File metadata
- Download URL: dirstree-0.0.4-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fddb86519735d6f841a3f5fb69913c92b945d950e6b29b7c42ed68e0f22bc32
|
|
| MD5 |
48df443ff11a1c6a34d2c36a2c22d87b
|
|
| BLAKE2b-256 |
7f137f6ea8b411d55db50e2a601a8bd8271b945c3d7fe2a972be9a0bb20926e2
|
Provenance
The following attestation bundles were made for dirstree-0.0.4-py3-none-any.whl:
Publisher:
release.yml on mutating/dirstree
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dirstree-0.0.4-py3-none-any.whl -
Subject digest:
3fddb86519735d6f841a3f5fb69913c92b945d950e6b29b7c42ed68e0f22bc32 - Sigstore transparency entry: 946528028
- Sigstore integration time:
-
Permalink:
mutating/dirstree@a0e8442d5527267ee7815a37932099dc897995ea -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mutating
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a0e8442d5527267ee7815a37932099dc897995ea -
Trigger Event:
push
-
Statement type: