Skip to main content

Transform set of words to efficient regular expression

Project description

trrex logo
 
Trrex Downloads Package Status Code Coverage Status PyPI Version Documentation Status

Efficient keyword mining with regular expressions

This package includes a pure Python function that enables you to represent a set of keywords (strings) as an efficient regular expression. With this regular expression, you can perform various operations, such as replacing and extracting keywords. The name of the package comes from the internal trie used to build the regular expression (trie to regex)

Install trrex

Use pip,

pip install trrex

Usage

import trrex as tx
import re

pattern = tx.make(['baby', 'bat', 'bad'])
hits = re.findall(pattern, 'The baby was scared by the bad bat.')
# hits = ['baby', 'bat', 'bad']

pandas

import trrex as tx
import pandas as pd

frame = pd.DataFrame({
    "txt": ["The baby", "The bat"]
})
pattern = tx.make(['baby', 'bat', 'bad'], prefix=r"\b(", suffix=r")\b") # need to specify capturing groups
frame["match"] = frame["txt"].str.extract(pattern)
hits = frame["match"].tolist()
print(hits)
# hits = ['baby', 'bad']

Why use trrex?

  • trrex builds a better regex pattern, than the simple regex union, therefore searching (and replacing) keywords is about 300 times faster than a regex union pattern, and about 2.5 times faster than FlashText algorithm. See below for a performance comparison:

Performance comparison

  • Plays well with others, can be integrated easily with pandas, spacy and any other regex engine. See the documentation for examples.
  • Pure Python, no other dependencies

Issues

If you have any issues with this repository, please don't hesitate to raise them. It is actively maintained, and we will do our best to help you.

Acknowledgments

This project is based on the following resources:

Liked the work?

If you've found this repository helpful, why not give it a star? It's an easy way to show your appreciation and support for the project. Plus, it helps others discover it too!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trrex-0.0.5.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

trrex-0.0.5-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file trrex-0.0.5.tar.gz.

File metadata

  • Download URL: trrex-0.0.5.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for trrex-0.0.5.tar.gz
Algorithm Hash digest
SHA256 f863246bb907d6c0ed212173311520d2f556be6061b6304fff82d8ab400abe41
MD5 e292265350e064dd2db64f6fa263b18d
BLAKE2b-256 aca5fb1250a8dd2799e3b7bea19e5176409de1227f00b78d32f2288206a4cc7c

See more details on using hashes here.

File details

Details for the file trrex-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: trrex-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for trrex-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5154e972f1383dc58210dba4440224095a75731acbf301da812a1f2328d4f303
MD5 80da545bdf57fef0ddf618155e399cc5
BLAKE2b-256 da1c3acff3c654151f0f6f522e1e6b1c2b5dd62a5c067764aa6ef35618756b76

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page