Transform trie to regular expression
Project description
Efficient keyword extraction with regex
This package contains a function for efficiently representing a set of keywords as regex. This regex can be used to replace keywords in sentences or extract keywords from sentences
Why use trrex?
- Pure Python, no other dependencies
- trrex is fast, about 300 times faster than a regex union, and about 2.5 times faster than FlashText
- Plays well with others, can be integrated easily with pandas
Install trrex
Use pip,
pip install trrex
Usage
import trrex as tx
pattern = tx.compile(['baby', 'bat', 'bad'])
hits = pattern.findall('The baby was scared by the bad bat.')
# hits = ['baby', 'bat', 'bad']
pandas
import trrex as tx
import pandas as pd
frame = pd.DataFrame({
"txt": ["The baby", "The bat"]
})
pattern = tx.make(['baby', 'bat', 'bad'], left=r"\b(", right=r")\b") # need to specify capturing groups
frame["match"] = frame["txt"].str.extract(pattern)
hits = frame["match"].tolist()
print(hits)
# hits = ['baby', 'bad']
Why the name?
Naming is difficult, but as we had to call it something:
- trex: trie to regex
- trex: Tyrannosaurus rex, a large dinosaur species with small arms (rex meaning "king" in Latin)
Acknowledgments
This project is based on the following resources:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
trrex-0.0.3.tar.gz
(4.5 kB
view hashes)
Built Distribution
trrex-0.0.3-py3-none-any.whl
(6.5 kB
view hashes)