Skip to main content

A convenient implementation of the Aho-Corasick algorithm to efficiently find multiple search patterns and process the matches

Project description

Introduction

Multimatcher is an implementation of the Aho-Corasick (Aho & Corasick 1975) search algorithm. It efficiently finds multiple keywords in an input string, without having to loop over the input string multiple times.

The rationale behind the Multimatcher is that most often we want to do something with the found matches, and the Multimatcher provides a flexible "replace" method that allows different use cases such as:

  • find and delete
  • find and replace
  • tag with a global label (i.e. all matches get the same label)
  • tag with custom label (i.e. each match gets its own label)
  • count matches

When possible, it's recommended to set whole_words_only to True, which makes matching significantly faster.

Examples

Find and delete matches

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_text("") # matches will be deleted
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x y z"

Find and transform matches

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_method(lambda x: x.capitalize()) # matches will be capitalized
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x A y B z C"

Find and replace matches with the same label

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_text("0") # all matches will be replaced with 0
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x 0 y 0 z 0"

Find and replace matches with custom labels

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_map({"a": "1", "b": "2", "c": "3"}) # replaces a > 1, b > 2, c > 3
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x 1 y 2 z 3"

Find and replace matches with custom labels

from multimatcher import Multimatcher
mm = Multimatcher(separator='')
mm.set_search_patterns(['a', 'b', 'c'])
mm.count("aa xx bb yy cc zz") # produces {'a': 2, 'b': 2, 'c': 2}

References

Aho, A. V., & Corasick, M. J. (1975). Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6), 333-340.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimatcher-0.0.3.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

multimatcher-0.0.3-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file multimatcher-0.0.3.tar.gz.

File metadata

  • Download URL: multimatcher-0.0.3.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for multimatcher-0.0.3.tar.gz
Algorithm Hash digest
SHA256 c46f4a2b00dafd8be3e72b76fca2f22bfa7d27edfcf6e058ce006627a502e96a
MD5 bde7ed98f750bfe2330a32b235d6d42c
BLAKE2b-256 407ddfd5cc6533139bfb5fd493973fb6b869ecc5adfbb3d17cdd87d54649d92c

See more details on using hashes here.

File details

Details for the file multimatcher-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for multimatcher-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1a57dcee52fade48764486534688a7d639f5e05eeb9cca77783bfaf7968e90f5
MD5 8e05ee01360403524bc4c35202e2ac97
BLAKE2b-256 b88780e4ac82e3c4ee59d2ed13e41b665ca494f8b2e5acecb1ab83c311f018d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page