Skip to main content

A convenient implementation of the Aho-Corasick algorithm to efficiently find multiple search patterns and process the matches

Project description

Introduction

Multimatcher is an implementation of the Aho-Corasick (Aho & Corasick 1975) search algorithm. It efficiently finds multiple keywords in an input string, without having to loop over the input string multiple times.

The rationale behind the Multimatcher is that most often we want to do something with the found matches, and the Multimatcher provides a flexible "replace" method that allows different use cases such as:

  • find and delete
  • find and replace
  • tag with a global label (i.e. all matches get the same label)
  • tag with custom label (i.e. each match gets its own label)

When possible, it's recommended to set whole_words_only to True, which makes matching significantly faster.

Examples

Find and delete matches

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_text("") # matches will be deleted
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x y z"

Find and transform matches

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_method(lambda x: x.capitalize()) # matches will be capitalized
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x A y B z C"

Find and replace matches with the same label

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_text("0") # all matches will be replaced with 0
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x 0 y 0 z 0"

Find and replace matches with custom labels

from multimatcher import Multimatcher
mm = Multimatcher(separator=' ')
mm.set_replacement_map({"a": "1", "b": "2", "c": "3"}) # replaces a > 1, b > 2, c > 3
mm.set_search_patterns(['a', 'b', 'c'])
mm.replace("x a y b z c") # produces "x 1 y 2 z 3"

References

Aho, A. V., & Corasick, M. J. (1975). Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6), 333-340.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimatcher-0.0.2.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

multimatcher-0.0.2-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file multimatcher-0.0.2.tar.gz.

File metadata

  • Download URL: multimatcher-0.0.2.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for multimatcher-0.0.2.tar.gz
Algorithm Hash digest
SHA256 9a81be87c84d8218180a85f74ece21ecf49f83fdd064b4c0812c5a2ed4b1dfa1
MD5 265fa0066fcb0054631c0ff62ba0417b
BLAKE2b-256 ce2598814853b993c3f20bd907f75b9cdb3410796806601dd90d722c4f34affd

See more details on using hashes here.

File details

Details for the file multimatcher-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: multimatcher-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for multimatcher-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 885b628fe8a15994efcd30357403e4e54767276e4d1f98c05cf8c33c09f8e586
MD5 dd3bcaa888e7c56512d05eca03c3ba80
BLAKE2b-256 311090b27d8716925219420bcce3868c7b565a76738bfcc38e31dc0bb5f22bbe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page