Filter filesystem paths based on gitignore-like patterns
Project description
py-walk
Python library to filter filesystem paths based on gitignore-like patterns.
Example:
from py_walk import walk
ignore = """
**/data/*.bin
# python files
__pycache__/
*.py[cod]
"""
for path in walk("some/directory", ignore=ignore):
do_something(path)
py-walk can be useful for applications or tools that work with paths and aim to
offer a .gitignore
type file to their users. It's also handy for users working
in interactive sessions who need to quickly retrieve sets of paths that must
meet relatively complex constraints.
py-walk tries to achieve 100% compatibility with Git's gitignore (wildmatch) pattern syntax. Currently, it includes more than 500 tests, which incorporate all the original tests from the Git codebase. These tests are executed against
git check-ignore
to ensure as much compatibility as possible. If you find any divergence, please don't hesitate to open an issue or PR.
Installation
To install py-walk, simply use pip
:
$ pip install py-walk
Usage
With py-walk, you have the ability to input paths into the library to determine whether they match with a set of gitignore-based patterns. Alternatively, you can directly traverse the contents of a directory, based on a set of conditions that the paths must meet.
walk
To walk through all the contents of a directory, don't provide any constraints:
from py-walk import walk
for path in walk("/some/directory/"):
print(path)
walk
accepts the directory to traverse as a strings or as a Path
object from
pathlib
. It returns Path
objects.
walk
returns a generator, if you prefer to get the results as a list or tuple, wrap the call with the desired data type constructor (eg.list(walk("some-dir"))
).
To ignore certain paths, you can pass patterns as a text or a list of patterns:
ignore = """
# these patterns use gitignore syntax
foo.txt
/bar/**/*.dat
"""
for path in walk("/some/directory", ignore=ignore):
...
or
ignore = ["foo.txt", "/bar/**/*.dat"]
for path in walk("/some/directory", ignore=ignore):
...
To only retrieve paths that match a set of patterns, use the match
parameter
(again, passing a text blob or a list of patterns):
for path in walk("/some/directory", ignore=["data/"], match=["*.css", "*.js"]):
...
Note that the
ignore
parameter has precedence: once a path is ignored it can't be reincluded using thematch
parameter due to performance reasons. That includes children of ignored directories. For example, if you ignore a directory/foo/
,/foo/bar/file.txt
will be ignored even ifmatch
includes the*.txt
pattern.
In addition, you can retrieve either only files or only directories using the
mode
parameter:
for path in walk("/some/directory", ignore=["static/"], mode="only-files"):
...
for path in walk("/some/directory", ignore=["static/"], mode="only-dirs"):
...
You can combine ignore
, match
and mode
to get the exact list of files
that you need. However, always remember that ignore
takes precedence over the
other two.
Note: you can convert any text containing gitignore-based patterns into a list using the
py_walk.pattern_text_to_pattern_list
function:from py_walk import pattern_text_to_pattern_list pattern_list = pattern_text_to_pattern_list(""" # some patterns **/foo.txt dir[A-Z]/ """)
get_parser_from_*
You can also create a parser from a gitignore-type text, a list of patterns or
a file handle to a .gitignore
type of file. Using the match
method of the
parser, you can directly evaluate paths.
from py_walk import get_parser_from_file
parser = get_parser_from_file("path/to/gitignore-type-file")
if parser.match("file.txt"):
print("file.txt matches!")
from py_walk import get_parser_from_text
patterns = """
# some comment
*.txt
**/bar/*.dat
"""
parser = get_parser_from_text(patterns, base_dir="/some/folder")
if parser.match("file.txt"):
print("file.txt matches!")
from py_walk import get_parser_from_list
patterns = [
"*.txt",
"**/bar/*.dat",
]
parser = get_parser_from_list(patterns, base_dir="/some/folder")
if parser.match("file.txt"):
...
The match
method requires either a string or a Path
object, which must
always be defined relative to a base_dir
. This base_dir
represents the
directory where the files are stored. When using get_parser_from_file
, the
base_dir
is established based on the location of the gitignore-type file,
mirroring the functionality of an actual .gitignore
file within a Git
repository. However, when using get_parser_from_text
or
get_parser_from_list
, you'll need to manually provide the base_dir
as a
parameter.
Note: it is possible to check non-existing paths using the parser, however, it needs a
base_dir
to replicate the behavior of Git, which checks the actual filesystem to determine some of the matches.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file py_walk-0.1.0.tar.gz
.
File metadata
- Download URL: py_walk-0.1.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a0bd7ec8179c3c0f62fa0a27a8c5e112eb28ca9d06061099a5aef368441a8a1 |
|
MD5 | 2b876672e75342313d3515ba6d64aae2 |
|
BLAKE2b-256 | 04c1e1b7398f81c29a770407d34a03f434eb6479e1e3a832b1e06a961392c979 |
File details
Details for the file py_walk-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: py_walk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f83f1f14445f1dc3da2f04bf4419c7b0c647a835a21e8b309ea4a5512491885 |
|
MD5 | c17c115eb218a7a7dd78548a2ec9bcc1 |
|
BLAKE2b-256 | 3a73d29b84330074bbec8fa2d222059db59c26b5350690b27bf0a058b241aba1 |