Skip to main content

CLI and library for efficient file path filtering using gitignore rules

Project description

Orgecc File Matcher

A Python library and CLI tool for Git-compatible file matching and directory traversal.

License Python Versions CI PyPI PyPI version Code style: black

A versatile command-line, Python library and toolkit for .gitignore-style file matching, designed to meet four key goals:

  1. Pure Python Matcher: Provide a pure Python implementation that precisely matches Git's behavior.
  2. File Walker: Traverse directories while respecting .gitignore rules at all levels.
  3. Unit Testing: Verify the correctness of any .gitignore matching library or command.
  4. Benchmarking: Compare the performance of different .gitignore matching implementations.

Features

  • Git-Compatible Matching: Pure Python implementation passes all test cases, ensuring 100% compatibility with Git's behavior.
  • Multiple Implementations: Choose from pure Python, external libraries (gitignorefile, pathspec), or native Git integration.
  • Multiple Implementations (see available options in MatcherImplementation):
    • Pure Python: No external dependencies. Aims at 100% Git compatibility.
    • Native Git Integration: Internally calls git check-ignore -v. The unit tests are adjusted according to this implementation.
    • External Libraries: Supports gitignorefile and pathspec.
  • Comprehensive Test Suite: Includes a test corpus for validating .gitignore matching behavior.
  • Tree-Sitter-Inspired Testing: The corpus files follow the same rigorous testing principles used by Tree-Sitter, ensuring high-quality and reliable test coverage.
  • Efficient Directory Traversal: A file walker that skips ignored files and directories.
  • Cross-Platform: Works seamlessly on Windows, macOS, and Linux.

Installation

Install via pip:

pip install orgecc-filematcher

Usage

Pure Python Matcher

Use the Git-compatible pure Python matcher (the default):

from orgecc.filematcher import get_factory, MatcherImplementation
from orgecc.filematcher.patterns import new_deny_pattern_source

factory = get_factory(MatcherImplementation.PURE_PYTHON)
patterns = new_deny_pattern_source(["*.pyc", "build/"])
matcher = factory.pattern2matcher(patterns)
result = matcher.match("path/to/file.pyc")
print(result.matches)  # True or False, matching Git's behavior

File Walker

Traverse directories while respecting .gitignore rules:

CLI Tool for macOS, Linux and Windows

Use the provided CLI tool to traverse directories while respecting .gitignore rules:

file-walker --help
Usage: file-walker [OPTIONS] PATH

  List files and directories while respecting gitignore patterns.

Options:
  -t, --type [all|f|d]            Type of entries to show
  -f, --format [absolute|relative|name]
                                  Output format for paths
  -X, --exclude-from FILE         Base gitignore file to apply before others
  -x, --exclude TEXT              Base patterns to ignore (applied before
                                  others)
  -0, --null                      Use null character as separator (useful for
                                  xargs)
  --suppress-errors               Suppress error messages
  -q, --quiet                     Don't show summary, be quiet
  --help                          Show this message and exit.

Python Class: DirectoryWalker

from orgecc.filematcher.walker import DirectoryWalker

walker = DirectoryWalker()
for file in walker.walk("path/to/directory"):
    print(file)
print(walker.yielded_count)
print(walker.ignored_count)

Unit Testing

Use the included test corpus to validate your .gitignore matching implementation.

You can see an example of failure below for the negation.txt test file:

Test file: negation.txt [block #7]
<.gitignore>
# ======================
# Advanced Negation & Anchored Patterns
# Demonstrates anchored patterns, directories, and multiple negation layers.
# We test directory handling, anchored patterns, and negation layering:
# ======================

# ignore top-level "build" directory
/build
# unignore a specific file inside that directory
!/build/allow.log

!/dist/allow.log
/dist

# ignore all .tmp files
*.tmp
# unignore a specific top-level file
!/global.tmp

# ignore all .log
*.log
# unignore only *.critical.log
!*.critical.log
</.gitignore>
T: 'build' # is a directory matching /build => ignored
T: 'build/allow.log' unignored, but was first ignored by dir, so still matches
T: 'build/subdir/file.txt' # inside build => ignored
T: 'dist'
T: 'dist/allow.log'
F: 'global.tmp' # unignored by !/global.tmp
T: 'random.tmp' # ignored by '*.tmp'
T: 'some/dir/random.tmp' # also ignored by '*.tmp'
T: 'system.log' # ignored by '*.log'
F: 'kernel.critical.log' # unignored by !*.critical.log
F: 'really.critical.log' # unignored by !*.critical.log
F: 'nested/dir/another.critical.log' # unignored by !*.critical.log
T: 'nested/dir/another.debug.log' # still ignored by '*.log'
Test Failure: gitignorefile[negation-#7-block34]
XFAIL tests/filematcher_corpus_test.py::test_corpus_extlib_gitignorefile[negation-#7-block34] - reason:
<.gitignore>
/build
!/build/allow.log
!/dist/allow.log
/dist
*.tmp
!/global.tmp
*.log
!*.critical.log
</.gitignore>


== Failures: 9 (negation-#7) ==

1. T->F 'build' is a directory matching /build => ignored
  Rule: ext-lib: gitignorefile
2. T->F 'build/allow.log' unignored, but was first ignored by dir, so still matches
  Rule: ext-lib: gitignorefile
3. T->F 'build/subdir/file.txt' inside build => ignored
  Rule: ext-lib: gitignorefile
4. T->F 'dist'
  Rule: ext-lib: gitignorefile
5. T->F 'dist/allow.log'
  Rule: ext-lib: gitignorefile
6. T->F 'random.tmp' ignored by '*.tmp'
  Rule: ext-lib: gitignorefile
7. T->F 'some/dir/random.tmp' also ignored by '*.tmp'
  Rule: ext-lib: gitignorefile
8. T->F 'system.log' ignored by '*.log'
  Rule: ext-lib: gitignorefile
9. T->F 'nested/dir/another.debug.log' still ignored by '*.log'
  Rule: ext-lib: gitignorefile

Benchmarking

Compare the performance of different matcher implementations:

from orgecc.filematcher import get_factory, MatcherImplementation

# Test pure Python implementation
factory = get_factory(MatcherImplementation.PURE_PYTHON)

# Test external library implementation
factory = get_factory(MatcherImplementation.EXTLIB_GITIGNOREFILE)

License

This project is licensed under the Apache 2 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orgecc_file_matcher-0.0.1.tar.gz (37.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orgecc_file_matcher-0.0.1-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file orgecc_file_matcher-0.0.1.tar.gz.

File metadata

  • Download URL: orgecc_file_matcher-0.0.1.tar.gz
  • Upload date:
  • Size: 37.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for orgecc_file_matcher-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a468c750f4cb0dd51d5bc946adba01df1b9dc921d48f0422d07820550d18b751
MD5 3ae48add7c9120559c419f99233a82d3
BLAKE2b-256 4b84d98910750b067c921ea3af6b6a3f3e1a574c88e874400ddc0c768f00ac2c

See more details on using hashes here.

File details

Details for the file orgecc_file_matcher-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for orgecc_file_matcher-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 40db3e4e966a80b2347ea4f7a6457e7f266186a867dd3cef860e34c0777abde4
MD5 609062f57344e59d991ec564080868bf
BLAKE2b-256 488778a46c9e90ed5bd49007634765b910de7af5f51fac671955ec4c18f03a8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page