Skip to main content

Utility library for gitignore style pattern matching of file paths.

Project description

PathSpec

pathspec is a utility library for pattern matching of file paths. So far this only includes Git’s gitignore pattern matching.

Tutorial

Say you have a “Projects” directory and you want to back it up, but only certain files, and ignore others depending on certain conditions:

>>> from pathspec import PathSpec
>>> # The gitignore-style patterns for files to select, but we're including
>>> # instead of ignoring.
>>> spec_text = """
...
... # This is a comment because the line begins with a hash: "#"
...
... # Include several project directories (and all descendants) relative to
... # the current directory. To reference only a directory you must end with a
... # slash: "/"
... /project-a/
... /project-b/
... /project-c/
...
... # Patterns can be negated by prefixing with exclamation mark: "!"
...
... # Ignore temporary files beginning or ending with "~" and ending with
... # ".swp".
... !~*
... !*~
... !*.swp
...
... # These are python projects so ignore compiled python files from
... # testing.
... !*.pyc
...
... # Ignore the build directories but only directly under the project
... # directories.
... !/*/build/
...
... """

The PathSpec class provides an abstraction around pattern implementations, and we want to compile our patterns as “gitignore” patterns. You could call it a wrapper for a list of compiled patterns:

>>> spec = PathSpec.from_lines('gitignore', spec_text.splitlines())

If we wanted to manually compile the patterns, we can use the GitIgnoreBasicPattern class directly. It is used in the background for “gitignore” which internally converts patterns to regular expressions:

>>> from pathspec.patterns.gitignore.basic import GitIgnoreBasicPattern
>>> patterns = map(GitIgnoreBasicPattern, spec_text.splitlines())
>>> spec = PathSpec(patterns)

PathSpec.from_lines() is a class method which simplifies that.

If you want to load the patterns from file, you can pass the file object directly as well:

>>> with open('patterns.list', 'r') as fh:
>>>     spec = PathSpec.from_lines('gitignore', fh)

You can perform matching on a whole directory tree with:

>>> matches = set(spec.match_tree_files('path/to/directory'))

Or you can perform matching on a specific set of file paths with:

>>> matches = set(spec.match_files(file_paths))

Or check to see if an individual file matches:

>>> is_matched = spec.match_file(file_path)

There’s actually two implementations of “gitignore”. The basic implementation is used by PathSpec and follows patterns as documented by gitignore. However, Git’s behavior differs from the documented patterns. There’s some edge-cases, and in particular, Git allows including files from excluded directories which appears to contradict the documentation. GitIgnoreSpec handles these cases to more closely replicate Git’s behavior:

>>> from pathspec import GitIgnoreSpec
>>> spec = GitIgnoreSpec.from_lines(spec_text.splitlines())

You do not specify the style of pattern for GitIgnoreSpec because it should always use GitIgnoreSpecPattern internally.

Performance

Running lots of regular expression matches against thousands of files in Python is slow. Alternate regular expression backends can be used to improve performance. PathSpec and GitIgnoreSpec both accept a backend parameter to control the backend. The default is “best” to automatically choose the best available backend. There are currently 3 backends.

The “simple” backend is the default and it simply uses Python’s re.Pattern objects that are normally created. This can be the fastest when there’s only 1 or 2 patterns.

The “hyperscan” backend uses the hyperscan library. Hyperscan tends to be at least 2 times faster than “simple”, and generally slower than “re2”. This can be faster than “re2” under the right conditions with pattern counts of 1-25.

The “re2” backend uses the google-re2 library (not to be confused with the re2 library on PyPI which is unrelated and abandoned). Google’s re2 tends to be significantly faster than “simple”, and 3 times faster than “hyperscan” at high pattern counts.

See benchmarks_backends.md for comparisons between native Python regular expressions and the optional backends.

FAQ

1. How do I ignore files like .gitignore?

GitIgnoreSpec (and PathSpec) positively match files by default. To find the files to keep, and exclude files like .gitignore, you need to set negate=True to flip the results:

>>> from pathspec import GitIgnoreSpec
>>> spec = GitIgnoreSpec.from_lines([...])
>>> keep_files = set(spec.match_tree_files('path/to/directory', negate=True))
>>> ignore_files = set(spec.match_tree_files('path/to/directory'))

License

pathspec is licensed under the Mozilla Public License Version 2.0. See LICENSE or the FAQ for more information.

In summary, you may use pathspec with any closed or open source project without affecting the license of the larger work so long as you:

  • give credit where credit is due,

  • and release any custom changes made to pathspec.

Source

The source code for pathspec is available from the GitHub repo cpburnz/python-pathspec.

Installation

pathspec is available for install through PyPI:

pip install pathspec

pathspec can also be built from source. The following packages will be required:

pathspec can then be built and installed with:

python -m build
pip install dist/pathspec-*-py3-none-any.whl

The following optional dependencies can be installed:

Documentation

Documentation for pathspec is available on Read the Docs.

The full change history can be found in CHANGES.rst and Change History.

An upgrade guide is available in UPGRADING.rst and Upgrade Guide.

Other Languages

The related project pathspec-ruby (by highb) provides a similar library as a Ruby gem.

Change History

1.0.2 (2026-01-07)

Bug fixes:

  • Type hint collections.abc.Callable does not properly replace typing.Callable until Python 3.9.2.

1.0.1 (2026-01-06)

Bug fixes:

  • Issue #100: ValueError(f”{patterns=!r} cannot be empty.”) when using black.

1.0.0 (2026-01-05)

Major changes:

  • Issue #91: Dropped support of EoL Python 3.8.

  • Added concept of backends to allow for faster regular expression matching. The backend can be controlled using the backend argument to PathSpec(), PathSpec.from_lines(), GitIgnoreSpec(), and GitIgnoreSpec.from_lines().

  • Renamed “gitwildmatch” pattern back to “gitignore”. The “gitignore” pattern behaves slightly differently when used with PathSpec (gitignore as documented) than with GitIgnoreSpec (replicates Git’s edge cases).

API changes:

  • Breaking: protected method pathspec.pathspec.PathSpec._match_file() (with a leading underscore) has been removed and replaced by backends. This does not affect normal usage of PathSpec or GitIgnoreSpec. Only custom subclasses will be affected. If this breaks your usage, let me know by opening an issue.

  • Deprecated: “gitwildmatch” is now an alias for “gitignore”.

  • Deprecated: pathspec.patterns.GitWildMatchPattern is now an alias for pathspec.patterns.gitignore.spec.GitIgnoreSpecPattern.

  • Deprecated: pathspec.patterns.gitwildmatch module has been replaced by the pathspec.patterns.gitignore package.

  • Deprecated: pathspec.patterns.gitwildmatch.GitWildMatchPattern is now an alias for pathspec.patterns.gitignore.spec.GitIgnoreSpecPattern.

  • Deprecated: pathspec.patterns.gitwildmatch.GitWildMatchPatternError is now an alias for pathspec.patterns.gitignore.GitIgnorePatternError.

  • Removed: pathspec.patterns.gitwildmatch.GitIgnorePattern has been deprecated since v0.4 (2016-07-15).

  • Signature of method pathspec.pattern.RegexPattern.match_file() has been changed from def match_file(self, file: str) -> RegexMatchResult | None to def match_file(self, file: AnyStr) -> RegexMatchResult | None to reflect usage.

  • Signature of class method pathspec.pattern.RegexPattern.pattern_to_regex() has been changed from def pattern_to_regex(cls, pattern: str) -> tuple[str, bool] to def pattern_to_regex(cls, pattern: AnyStr) -> tuple[AnyStr | None, bool | None] to reflect usage and documentation.

New features:

  • Added optional “hyperscan” backend using hyperscan library. It will automatically be used when installed. This dependency can be installed with pip install 'pathspec[hyperscan]'.

  • Added optional “re2” backend using the google-re2 library. It will automatically be used when installed. This dependency can be installed with pip install 'pathspec[re2]'.

  • Added optional dependency on typing-extensions library to improve some type hints.

Bug fixes:

  • Issue #93: Do not remove leading spaces.

  • Issue #95: Matching for files inside folder does not seem to behave like .gitignore’s.

  • Issue #98: UnboundLocalError in RegexPattern when initialized with pattern=None.

  • Type hint on return value of pathspec.pattern.RegexPattern.match_file() to match documentation.

Improvements:

  • Mark Python 3.13 and 3.14 as supported.

  • No-op patterns are now filtered out when matching files, slightly improving performance.

  • Fix performance regression in iter_tree_files() from v0.10.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathspec-1.0.2.tar.gz (130.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pathspec-1.0.2-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file pathspec-1.0.2.tar.gz.

File metadata

  • Download URL: pathspec-1.0.2.tar.gz
  • Upload date:
  • Size: 130.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pathspec-1.0.2.tar.gz
Algorithm Hash digest
SHA256 fa32b1eb775ed9ba8d599b22c5f906dc098113989da2c00bf8b210078ca7fb92
MD5 468a62d34d0cb6005894a3216dc9be59
BLAKE2b-256 41b96eb731b52f132181a9144bbe77ff82117f6b2d2fbfba49aaab2c014c4760

See more details on using hashes here.

Provenance

The following attestation bundles were made for pathspec-1.0.2.tar.gz:

Publisher: publish-to-pypi.yml on cpburnz/python-pathspec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pathspec-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pathspec-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 54.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pathspec-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 62f8558917908d237d399b9b338ef455a814801a4688bc41074b25feefd93472
MD5 369492689210bece32c67fdd17266d5b
BLAKE2b-256 786b14fc9049d78435fd29e82846c777bd7ed9c470013dc8d0260fff3ff1c11e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pathspec-1.0.2-py3-none-any.whl:

Publisher: publish-to-pypi.yml on cpburnz/python-pathspec

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page