CLI and library for efficient file path filtering using gitignore rules
Project description
Orgecc File Matcher
A Python library and CLI tool for Git-compatible file matching and directory traversal.
A versatile command-line, Python library and toolkit for .gitignore-style file matching, designed to meet four key goals:
- Pure Python Matcher: Provide a pure Python implementation that precisely matches Git's behavior.
- File Walker: Traverse directories while respecting
.gitignorerules at all levels. - Unit Testing: Verify the correctness of any
.gitignorematching library or command. - Benchmarking: Compare the performance of different
.gitignorematching implementations.
Features
- Git-Compatible Matching: Pure Python implementation passes all test cases, ensuring 100% compatibility with Git's behavior.
- Multiple Implementations: Choose from pure Python, external libraries (gitignorefile, pathspec), or native Git integration.
- Multiple Implementations (see available options in MatcherImplementation):
- Pure Python: No external dependencies. Aims at 100% Git compatibility.
- Native Git Integration: Internally calls
git check-ignore -v. The unit tests are adjusted according to this implementation. - External Libraries: Supports gitignorefile and pathspec.
- Comprehensive Test Suite: Includes a test corpus for validating
.gitignorematching behavior. - Tree-Sitter-Inspired Testing: The corpus files follow the same rigorous testing principles used by Tree-Sitter, ensuring high-quality and reliable test coverage.
- Efficient Directory Traversal: A file walker that skips ignored files and directories.
- Cross-Platform: Works seamlessly on Windows, macOS, and Linux.
Installation
Install via pip:
pip install orgecc-filematcher
Usage
Pure Python Matcher
Use the Git-compatible pure Python matcher (the default):
from orgecc.filematcher import get_factory, MatcherImplementation
from orgecc.filematcher.patterns import new_deny_pattern_source
factory = get_factory(MatcherImplementation.PURE_PYTHON)
patterns = new_deny_pattern_source(["*.pyc", "build/"])
matcher = factory.pattern2matcher(patterns)
result = matcher.match("path/to/file.pyc")
print(result.matches) # True or False, matching Git's behavior
File Walker
Traverse directories while respecting .gitignore rules:
CLI Tool for macOS, Linux and Windows
Use the provided CLI tool to traverse directories while respecting .gitignore rules:
file-walker --help
Usage: file-walker [OPTIONS] PATH
List files and directories while respecting gitignore patterns.
Options:
-t, --type [all|f|d] Type of entries to show
-f, --format [absolute|relative|name]
Output format for paths
-X, --exclude-from FILE Base gitignore file to apply before others
-x, --exclude TEXT Base patterns to ignore (applied before
others)
-0, --null Use null character as separator (useful for
xargs)
--suppress-errors Suppress error messages
-q, --quiet Don't show summary, be quiet
--help Show this message and exit.
Python Class: DirectoryWalker
from orgecc.filematcher.walker import DirectoryWalker
walker = DirectoryWalker()
for file in walker.walk("path/to/directory"):
print(file)
print(walker.yielded_count)
print(walker.ignored_count)
Unit Testing
Use the included test corpus to validate your .gitignore matching implementation.
You can see an example of failure below for the negation.txt test file:
Test file: negation.txt [block #7]
<.gitignore>
# ======================
# Advanced Negation & Anchored Patterns
# Demonstrates anchored patterns, directories, and multiple negation layers.
# We test directory handling, anchored patterns, and negation layering:
# ======================
# ignore top-level "build" directory
/build
# unignore a specific file inside that directory
!/build/allow.log
!/dist/allow.log
/dist
# ignore all .tmp files
*.tmp
# unignore a specific top-level file
!/global.tmp
# ignore all .log
*.log
# unignore only *.critical.log
!*.critical.log
</.gitignore>
T: 'build' # is a directory matching /build => ignored
T: 'build/allow.log' unignored, but was first ignored by dir, so still matches
T: 'build/subdir/file.txt' # inside build => ignored
T: 'dist'
T: 'dist/allow.log'
F: 'global.tmp' # unignored by !/global.tmp
T: 'random.tmp' # ignored by '*.tmp'
T: 'some/dir/random.tmp' # also ignored by '*.tmp'
T: 'system.log' # ignored by '*.log'
F: 'kernel.critical.log' # unignored by !*.critical.log
F: 'really.critical.log' # unignored by !*.critical.log
F: 'nested/dir/another.critical.log' # unignored by !*.critical.log
T: 'nested/dir/another.debug.log' # still ignored by '*.log'
Test Failure: gitignorefile[negation-#7-block34]
XFAIL tests/filematcher_corpus_test.py::test_corpus_extlib_gitignorefile[negation-#7-block34] - reason:
<.gitignore>
/build
!/build/allow.log
!/dist/allow.log
/dist
*.tmp
!/global.tmp
*.log
!*.critical.log
</.gitignore>
== Failures: 9 (negation-#7) ==
1. T->F 'build' is a directory matching /build => ignored
Rule: ext-lib: gitignorefile
2. T->F 'build/allow.log' unignored, but was first ignored by dir, so still matches
Rule: ext-lib: gitignorefile
3. T->F 'build/subdir/file.txt' inside build => ignored
Rule: ext-lib: gitignorefile
4. T->F 'dist'
Rule: ext-lib: gitignorefile
5. T->F 'dist/allow.log'
Rule: ext-lib: gitignorefile
6. T->F 'random.tmp' ignored by '*.tmp'
Rule: ext-lib: gitignorefile
7. T->F 'some/dir/random.tmp' also ignored by '*.tmp'
Rule: ext-lib: gitignorefile
8. T->F 'system.log' ignored by '*.log'
Rule: ext-lib: gitignorefile
9. T->F 'nested/dir/another.debug.log' still ignored by '*.log'
Rule: ext-lib: gitignorefile
Benchmarking
Compare the performance of different matcher implementations:
from orgecc.filematcher import get_factory, MatcherImplementation
# Test pure Python implementation
factory = get_factory(MatcherImplementation.PURE_PYTHON)
# Test external library implementation
factory = get_factory(MatcherImplementation.EXTLIB_GITIGNOREFILE)
License
This project is licensed under the Apache 2 License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orgecc_file_matcher-0.0.1.tar.gz.
File metadata
- Download URL: orgecc_file_matcher-0.0.1.tar.gz
- Upload date:
- Size: 37.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a468c750f4cb0dd51d5bc946adba01df1b9dc921d48f0422d07820550d18b751
|
|
| MD5 |
3ae48add7c9120559c419f99233a82d3
|
|
| BLAKE2b-256 |
4b84d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c
|
File details
Details for the file orgecc_file_matcher-0.0.1-py3-none-any.whl.
File metadata
- Download URL: orgecc_file_matcher-0.0.1-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40db3e4e966a80b2347ea4f7a6457e7f266186a867dd3cef860e34c0777abde4
|
|
| MD5 |
609062f57344e59d991ec564080868bf
|
|
| BLAKE2b-256 |
488778a46c9e90ed5bd49007634765b910de7af5f51fac671955ec4c18f03a8e
|