Utilities for efficient glob matching using tries
Project description
glob-tries
Description
glob-tries
provides two classes, GlobTrie
and PathTrie
, which use slightly modified trie datastructures to efficiently store and query collections of globs and paths. These can be used for efficient indexing and matching of file trees when you have multiple glob patterns that might match a file. It also provides consistent precedence rules.
Installation
pip install glob-tries
poetry add glob-tries
Usage
import glob_tries
GlobTrie
GlobTrie
can be thought of a dict
where objects can be put into the dict using shell-style wildcard paths.
This is helpful in certain scenarios when you must group file paths, or file-path-like strings, into a variety of sets based on a variety of glob patterns. For example, say you have the following rules:
- All files in
/foo/bar/baz
are of groupbaz
- All
.yaml
,.yml
, or.json
files infoo/*/baz
are in groupconfig
- All other files in
foo
are in groupfoo
- All
.txt
files not otherwise covered by another rule should be in grouptext
You can express this with:
from glob_tries import GlobTrie
trie = GlobTrie()
trie.augment("foo/bar/baz/**", "baz")
trie.augment("foo/*/baz/**/*.json", "config")
trie.augment("foo/*/baz/**/*.yaml", "config")
trie.augment("foo/*/baz/**/*.yml", "config")
trie.augment("foo/**", "foo")
trie.augment("**/*.txt", "text")
A call to trie.get
with a path that matches these rules will return the correct group. Precedence is based on how "precise" a matching expression is; the matching expression will proceed left to right, trying more specific checks (single letters) before less specific checks (**
). The order of evaluation is:
- Single letters, as well as
[abc]
-type groups [!abc]
-type negative groups?
single-character wildcards*
single-folder wildcards**
recursive wildcards
GlobTrie
supports *
, **
, ?
, [abc]
, and [!abc]
-style shell globbing.
from glob_tries import GlobTrie
trie = GlobTrie()
trie.augment("foo", 1)
trie.augment("foo/*/bar", 2)
trie.augment("ba[rz]", 3)
trie.augment("ba[!m]", 4)
trie.augment("qu?z", 5)
trie.augment("spam/**/obj", 6)
trie.get("foo") # 1
trie.get("foobar") # None
trie.get("foo/baz/bar") # 2
trie.get("foo/egg/bar") # 2
trie.get("foo/egg/spam/bar") # None
trie.get("bar") # 3
trie.get("baz") # 3
trie.get("bam") # None
trie.get("bax") # 4
trie.get("quzz") # 5
trie.get("quaz") # 5
trie.get("quoz") # 5
trie.get("spam/obj") # 6
trie.get("spam/eggs/obj") # 6
trie.get("spam/ham/eggs/obj") # 6
trie.get("spam/ham/eggs/notobj") # None
PathTrie
PathTrie
is the inverse of GlobTrie
. It stores a list of files in a directory, or strings that are arranged like files in a directory, and lets you efficiently list all files that match an arbitrary glob pattern. (The actual memory representation of the files is somewhat inefficient due to unavoidable Python overhead. Since each "node" in the trie is a Python object, there is a significant amount of overhead, meaning in many cases storing the trie representation of a list of many paths can be less efficient than just storing the list. It's computationally much more efficient to query, though.) PathTrie
supports the same set of characters and operators as GlobTrie
.
from glob_tries import PathTrie
trie = PathTrie()
trie.augment("foo.py")
trie.augment("bar.py")
trie.augment("baz.py")
trie.augment("folder1/foo.py")
trie.augment("folder1/foo.yaml")
trie.augment("folder1/subfolder/foo.yaml")
trie.augment("folder2/foo.yaml")
trie.get_all_matches("foo.py")
# ["foo.py"]
trie.get_all_matches("ba[rz].py")
# ["bar.py", "baz.py"]
trie.get_all_matches("folder1/*")
# ["folder1/foo.py", "folder1/foo.yaml"]
trie.get_all_matches("folder1/**")
# ["folder1/foo.py", "folder1/foo.yaml", "folder1/subfolder/foo.yaml"]
trie.get_all_matches("folder1/**/*.yaml")
# ["folder1/foo.yaml", "folder1/subfolder/foo.yaml"]
trie.get_all_matches("**/*.yaml")
# ["folder1/foo.yaml", "folder2/foo.yaml", "folder1/subfolder/foo.yaml"]
Contributing
We welcome contributions from the open-source community. See CONTRIBUTING.md for details.
The project currently has exhaustive test coverage. New additions should include similarly exhaustive coverage. Bugfixes should include a test that catches the bug condition. Unit tests can be run with pytest
:
pytest
There are multiple pre-commit hooks that enforce typechecking, code style guidelines, and linter guidelines. Install them before development:
poetry run pre-commit install
License
This library is licensed under the BSD 3-Clause license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file glob_tries-1.0.2.tar.gz
.
File metadata
- Download URL: glob_tries-1.0.2.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.7.16 Linux/5.15.160-104.158.amzn2.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bb008ec5542eac607c82311064202db00a8c8231229f68191e0dbeec5d76d1f |
|
MD5 | 5859a28d25cd7e510e8ff7e15573bfa4 |
|
BLAKE2b-256 | b8f2bfa5ea3b8a88a8314dc44a738fb22d37fa00f5bba0eeecba3f0a501f8d18 |
File details
Details for the file glob_tries-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: glob_tries-1.0.2-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.7.16 Linux/5.15.160-104.158.amzn2.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69c614bd11f3b933d622d66d8d944f0453542e4733406006370c357f74ab23f7 |
|
MD5 | ef512680f210d7a96676341c5b72129a |
|
BLAKE2b-256 | 2f4b2ac5e2f11dcabeaca0fbab07a2188af96a0a3ce04ef24b68dc4411c5f706 |