Fast lookup for string patterns using Trie, Radix & Aho-Corasick algorithms
Project description
TrieMatch
A multiple pattern matching library. Trying to implement some algorithms for matching many string patterns in the pattern.
This project is a WIP and I would like to receive your idea on improving it.
Trie (Prefix Tree)
This class implements a prefix tree for matching the patterns.
It uses a normal Trir data structure, and Aho-Corasick is used for pattern search (Trie.search). Just make sure to use Trie.link_nodes().
import this
from triematch import Trie
zen_of_klingon = this.s
print(zen_of_klingon)
zen_of_klingon = zen_of_klingon.lower()
words = {
"gur": "the",
"mra": "zen",
"gur mra": "the zen",
"guna": "than",
"chevgl": "purity",
"Pbzcyrk": "complex",
}
wordset = Trie(words) ## or Trie(**words) like a dict initalization
## Similar behavior with dict object
"error" in wordset # Output: True
wordset.get('error') # Output: False
wordset.setdefault("error", "reebef") # Outpit: reebef
wordset
# Output: {'error': 'reebef', 'complex': 'Pbzcyrk', 'purity': 'chevgl', 'mra': 'zen', 'gur': 'the', 'gur mra': 'the zen'}
## Get list of all patterns which zen_of_klingon.strtswith(pattern)
list(wordset.match(zen_of_klingon))
# Output: [(3, 'the'), (7, 'the zen')]
## where wordset[zen_of_klingon[:3]] == 'the'
wordset.link_nodes() ## do this to speed up the search process
list(wordset.search(zen_of_klingon))
# Output: [(0, 3, 'the'), (0, 7, 'the zen'), (4, 7, 'zen'), (54, 58, 'than'), ...]
## where wordset[zen_of_klingon[4:7]] == 'zen'
## Compressed regex of Trie
wordset.to_regex()
'Pbzcyrk|chevgl|gu(?:na|r)|mra'
Tuples as Trie keys
TupleTrie treats keys as tuples (instead of strings), so you can pass keys like tuple of numbers as keys.
from triematch import TupleTrie
trie = TupleTrie()
trie[127,0,0,1] = "home"
trie[(8,8,8,8)] = "Google Public DNS"
trie["hello", "python"] = object()
list(trie.match((127,0,0,1,2,3)))
## Output; [(4, 'home')]
Radix
A compressed prefix tree. This is a memory efficient data structure compared to Trie with same features (but current version is slower that Trie).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file triematch-0.0.1rc0.tar.gz.
File metadata
- Download URL: triematch-0.0.1rc0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2ee3affafd84aa839c24620da51a850acf97a5a4f0af5a861b24975f3d3c68c
|
|
| MD5 |
f4a2ec7bd68131b01008380f4d221e23
|
|
| BLAKE2b-256 |
86fdc630886634ab421fa0832b5c675d0d69de78904714183ae01ce706a51d7b
|
File details
Details for the file triematch-0.0.1rc0-py3-none-any.whl.
File metadata
- Download URL: triematch-0.0.1rc0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
704e06e712f47e573fa2efc4a956efd2de33b860bb00113548236316b724044b
|
|
| MD5 |
1201a2ff5e2d04b6687a2a6ca838a5e3
|
|
| BLAKE2b-256 |
74e9ee8ced7e0851cb6e60b479c92a039ea6cb431abc75b50d2d088b7ab56fe4
|