Utilities for efficient glob matching using tries
Project description
glob-tries
Description
glob-tries provides two classes, GlobTrie and PathTrie, which use slightly modified trie datastructures to efficiently store and query collections of globs and paths. These can be used for efficient indexing and matching of file trees when you have multiple glob patterns that might match a file. It also provides consistent precedence rules.
Installation
pip install glob-tries
poetry add glob-tries
Usage
import glob_tries
GlobTrie
GlobTrie can be thought of a dict where objects can be put into the dict using shell-style wildcard paths.
This is helpful in certain scenarios when you must group file paths, or file-path-like strings, into a variety of sets based on a variety of glob patterns. For example, say you have the following rules:
- All files in
/foo/bar/bazare of groupbaz - All
.yaml,.yml, or.jsonfiles infoo/*/bazare in groupconfig - All other files in
fooare in groupfoo - All
.txtfiles not otherwise covered by another rule should be in grouptext
You can express this with:
from glob_tries import GlobTrie
trie = GlobTrie()
trie.augment("foo/bar/baz/**", "baz")
trie.augment("foo/*/baz/**/*.json", "config")
trie.augment("foo/*/baz/**/*.yaml", "config")
trie.augment("foo/*/baz/**/*.yml", "config")
trie.augment("foo/**", "foo")
trie.augment("**/*.txt", "text")
A call to trie.get with a path that matches these rules will return the correct group. Precedence is based on how "precise" a matching expression is; the matching expression will proceed left to right, trying more specific checks (single letters) before less specific checks (**). The order of evaluation is:
- Single letters, as well as
[abc]-type groups [!abc]-type negative groups?single-character wildcards*single-folder wildcards**recursive wildcards
GlobTrie supports *, **, ?, [abc], and [!abc]-style shell globbing.
from glob_tries import GlobTrie
trie = GlobTrie()
trie.augment("foo", 1)
trie.augment("foo/*/bar", 2)
trie.augment("ba[rz]", 3)
trie.augment("ba[!m]", 4)
trie.augment("qu?z", 5)
trie.augment("spam/**/obj", 6)
trie.get("foo") # 1
trie.get("foobar") # None
trie.get("foo/baz/bar") # 2
trie.get("foo/egg/bar") # 2
trie.get("foo/egg/spam/bar") # None
trie.get("bar") # 3
trie.get("baz") # 3
trie.get("bam") # None
trie.get("bax") # 4
trie.get("quzz") # 5
trie.get("quaz") # 5
trie.get("quoz") # 5
trie.get("spam/obj") # 6
trie.get("spam/eggs/obj") # 6
trie.get("spam/ham/eggs/obj") # 6
trie.get("spam/ham/eggs/notobj") # None
PathTrie
PathTrie is the inverse of GlobTrie. It stores a list of files in a directory, or strings that are arranged like files in a directory, and lets you efficiently list all files that match an arbitrary glob pattern. (The actual memory representation of the files is somewhat inefficient due to unavoidable Python overhead. Since each "node" in the trie is a Python object, there is a significant amount of overhead, meaning in many cases storing the trie representation of a list of many paths can be less efficient than just storing the list. It's computationally much more efficient to query, though.) PathTrie supports the same set of characters and operators as GlobTrie.
from glob_tries import PathTrie
trie = PathTrie()
trie.augment("foo.py")
trie.augment("bar.py")
trie.augment("baz.py")
trie.augment("folder1/foo.py")
trie.augment("folder1/foo.yaml")
trie.augment("folder1/subfolder/foo.yaml")
trie.augment("folder2/foo.yaml")
trie.get_all_matches("foo.py")
# ["foo.py"]
trie.get_all_matches("ba[rz].py")
# ["bar.py", "baz.py"]
trie.get_all_matches("folder1/*")
# ["folder1/foo.py", "folder1/foo.yaml"]
trie.get_all_matches("folder1/**")
# ["folder1/foo.py", "folder1/foo.yaml", "folder1/subfolder/foo.yaml"]
trie.get_all_matches("folder1/**/*.yaml")
# ["folder1/foo.yaml", "folder1/subfolder/foo.yaml"]
trie.get_all_matches("**/*.yaml")
# ["folder1/foo.yaml", "folder2/foo.yaml", "folder1/subfolder/foo.yaml"]
Contributing
We welcome contributions from the open-source community. See CONTRIBUTING.md for details.
The project currently has exhaustive test coverage. New additions should include similarly exhaustive coverage. Bugfixes should include a test that catches the bug condition. Unit tests can be run with pytest:
pytest
There are multiple pre-commit hooks that enforce typechecking, code style guidelines, and linter guidelines. Install them before development:
poetry run pre-commit install
License
This library is licensed under the BSD 3-Clause license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glob_tries-1.0.2.tar.gz.
File metadata
- Download URL: glob_tries-1.0.2.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.7.16 Linux/5.15.160-104.158.amzn2.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bb008ec5542eac607c82311064202db00a8c8231229f68191e0dbeec5d76d1f
|
|
| MD5 |
5859a28d25cd7e510e8ff7e15573bfa4
|
|
| BLAKE2b-256 |
b8f2bfa5ea3b8a88a8314dc44a738fb22d37fa00f5bba0eeecba3f0a501f8d18
|
File details
Details for the file glob_tries-1.0.2-py3-none-any.whl.
File metadata
- Download URL: glob_tries-1.0.2-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.7.16 Linux/5.15.160-104.158.amzn2.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69c614bd11f3b933d622d66d8d944f0453542e4733406006370c357f74ab23f7
|
|
| MD5 |
ef512680f210d7a96676341c5b72129a
|
|
| BLAKE2b-256 |
2f4b2ac5e2f11dcabeaca0fbab07a2188af96a0a3ce04ef24b68dc4411c5f706
|