Skip to main content

Fast lookup for string patterns using Trie, Radix & Aho-Corasick algorithms

Project description

TrieMatch

A multiple pattern matching library. Trying to implement some algorithms for matching many string patterns in the pattern.

This project is a WIP and I would like to receive your idea on improving it.

Trie (Prefix Tree)

This class implements a prefix tree for matching the patterns.

It uses a normal Trir data structure, and Aho-Corasick is used for pattern search (Trie.search). Just make sure to use Trie.link_nodes().

import this
from triematch import Trie

zen_of_klingon = this.s
print(zen_of_klingon)

zen_of_klingon = zen_of_klingon.lower()

words = {
    "gur": "the",
    "mra": "zen",
    "gur mra": "the zen",
    "guna": "than",
    "chevgl": "purity",
    "Pbzcyrk": "complex",
}


wordset = Trie(words) ## or Trie(**words) like a dict initalization

## Similar behavior with dict object
"error" in wordset # Output: True
wordset.get('error') # Output: False
wordset.setdefault("error", "reebef") # Outpit: reebef

wordset
# Output: {'error': 'reebef', 'complex': 'Pbzcyrk', 'purity': 'chevgl', 'mra': 'zen', 'gur': 'the', 'gur mra': 'the zen'}

## Get list of all patterns which zen_of_klingon.strtswith(pattern)
list(wordset.match(zen_of_klingon))
# Output: [(3, 'the'), (7, 'the zen')]
## where wordset[zen_of_klingon[:3]] == 'the'

wordset.link_nodes() ## do this to speed up the search process
list(wordset.search(zen_of_klingon))
# Output: [(0, 3, 'the'), (0, 7, 'the zen'), (4, 7, 'zen'), (54, 58, 'than'), ...]
## where wordset[zen_of_klingon[4:7]] == 'zen'

## Compressed regex of Trie
wordset.to_regex()
'Pbzcyrk|chevgl|gu(?:na|r)|mra'

Tuples as Trie keys

TupleTrie treats keys as tuples (instead of strings), so you can pass keys like tuple of numbers as keys.

from triematch import TupleTrie

trie = TupleTrie()
trie[127,0,0,1] = "home"
trie[(8,8,8,8)] = "Google Public DNS"
trie["hello", "python"] = object()

list(trie.match((127,0,0,1,2,3)))
## Output; [(4, 'home')]

Radix

A compressed prefix tree. This is a memory efficient data structure compared to Trie with same features (but current version is slower that Trie).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

triematch-0.0.1rc0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

triematch-0.0.1rc0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file triematch-0.0.1rc0.tar.gz.

File metadata

  • Download URL: triematch-0.0.1rc0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.15

File hashes

Hashes for triematch-0.0.1rc0.tar.gz
Algorithm Hash digest
SHA256 b2ee3affafd84aa839c24620da51a850acf97a5a4f0af5a861b24975f3d3c68c
MD5 f4a2ec7bd68131b01008380f4d221e23
BLAKE2b-256 86fdc630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b

See more details on using hashes here.

File details

Details for the file triematch-0.0.1rc0-py3-none-any.whl.

File metadata

File hashes

Hashes for triematch-0.0.1rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 704e06e712f47e573fa2efc4a956efd2de33b860bb00113548236316b724044b
MD5 1201a2ff5e2d04b6687a2a6ca838a5e3
BLAKE2b-256 74e9ee8ced7e0851cb6e60b479c92a039ea6cb431abc75b50d2d088b7ab56fe4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page