Transform trie to regular expression
Project description
Efficient keyword extraction with regex
This package contains a function for efficiently representing a set of keywords as regex. This regex can be used to replace keywords in sentences or extract keywords from sentences
Why use trrex?
- Pure Python, no other dependencies
- trrex is fast, about 300 times faster than a regex union, and about 2.5 times faster than FlashText
- Plays well with others, can be integrated easily with pandas
Install trrex
Use pip,
pip install trrex
Usage
import trrex as tx pattern = tx.compile(['baby', 'bat', 'bad']) hits = pattern.findall('The baby was scared by the bad bat.') # hits = ['baby', 'bat', 'bad']
pandas
import trrex as tx import pandas as pd frame = pd.DataFrame({ "txt": ["The baby", "The bat"] }) pattern = tx.make(['baby', 'bat', 'bad'], prefix=r"\b(", suffix=r")\b") # need to specify capturing groups frame["match"] = frame["txt"].str.extract(pattern) hits = frame["match"].tolist() print(hits) # hits = ['baby', 'bad']
Why the name?
Naming is difficult, but as we had to call it something:
- trex: trie to regex
- trex: Tyrannosaurus rex, a large dinosaur species with small arms (rex meaning "king" in Latin)
Acknowledgments
This project is based on the following resources:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size trrex-0.0.4-py3-none-any.whl (6.8 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size trrex-0.0.4.tar.gz (4.8 kB) | File type Source | Python version None | Upload date | Hashes View |