Skip to main content

Fast text/token matching and replacement

Project description

Python matchtext

PyPi version Python compatibility

Python 3 package for fast text matching and replacing.

This library implements two fast approaches for matching keywords/gazetteer entries:

  • TokenMatcher: keywords/gazetteer entries are sequences of tokens, optionally associated with some data and the matcher tries to match any of those in a given sequence of tokens.
  • StringMatcher: keywords/gazetter entries are strings, optionally associated with some data and the matcher tries to match any of those in a given string, optionally only at non-word boundaries.

The matchers are implemented to be fast: TokenMatcher is a hash tree, StringMatcher uses a character trie implementation underneath. Both matchers implement additional features often required in NLP:

  • return the offsets in the original iterable where a match occurs
  • mapfunc: tokens/characters can be mapped to some canonical form that is used for matching
  • ignorefunc: some tokens/characters can be entirely ignored for matching
  • match all/longest: only match the longest entry versus all entries
  • skip/noskip: if any match is found, continue matching after the longest match versus at the next position

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchtext-0.2.3.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matchtext-0.2.3-py2.py3-none-any.whl (19.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file matchtext-0.2.3.tar.gz.

File metadata

  • Download URL: matchtext-0.2.3.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.10

File hashes

Hashes for matchtext-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ad9ee6797240d9ba4c24721d332a88f54084caf341285e520058e35249ae3f12
MD5 7170443b1ad6d07b02c0fa9f53a0c422
BLAKE2b-256 d92d02e6b6b574176522ce4623aa133114d857198ebc7d752a090a496722c733

See more details on using hashes here.

File details

Details for the file matchtext-0.2.3-py2.py3-none-any.whl.

File metadata

  • Download URL: matchtext-0.2.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.10

File hashes

Hashes for matchtext-0.2.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 349f57276427503d9480287108e79c656a12816ddff3f986d0693b85473b414a
MD5 ecf412d9af1e386e8e83181a692370ec
BLAKE2b-256 ce41a4935eece551978ce25fdf773d215434f1db41e46ea2cae9e6d48e6223ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page