Skip to main content

Transform set of words to efficient regular expression

Project description

trrex logo
 
Trrex Downloads PyPI Version Package Status Code Coverage Status Documentation Status

Efficient string matching with regular expressions

This package includes a pure Python function that enables you to represent a set of strings as a regular expression. With this regular expression, you can perform various operations, such as replacing, extracting and matching keywords. The name of the package comes from the internal trie used to build the regular expression (TRie to REgeX)

Install trrex

Use pip,

pip install trrex

Usage

import trrex as tx
import re

pattern = tx.make(['baby', 'bat', 'bad'])
hits = re.findall(pattern, 'The baby was scared by the bad bat.')
# hits = ['baby', 'bat', 'bad']

pandas

import trrex as tx
import pandas as pd

frame = pd.DataFrame({
    "txt": ["The baby", "The bat"]
})
pattern = tx.make(['baby', 'bat', 'bad'], prefix=r"\b(", suffix=r")\b") # need to specify capturing groups
frame["match"] = frame["txt"].str.extract(pattern)
hits = frame["match"].tolist()
print(hits)
# hits = ['baby', 'bad']

Why use trrex?

  • trrex builds a better regex pattern, than the simple regex union, therefore searching (and replacing) strings is about 300 times faster than a regex union pattern, and about 2.5 times faster than FlashText algorithm. See below for a performance comparison:

Performance comparison

  • Plays well with others, can be integrated easily with pandas, spacy and any other regex engine. See the documentation for examples.
  • Pure Python, no other dependencies

Issues

If you have any issues with this repository, please don't hesitate to raise them. It is actively maintained, and we will do our best to help you.

Acknowledgments

This project is based on the following resources:

Liked the work?

If you've found this repository helpful, why not give it a star? It's an easy way to show your appreciation and support for the project. Plus, it helps others discover it too!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trrex-0.0.7.tar.gz (195.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trrex-0.0.7-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file trrex-0.0.7.tar.gz.

File metadata

  • Download URL: trrex-0.0.7.tar.gz
  • Upload date:
  • Size: 195.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.0

File hashes

Hashes for trrex-0.0.7.tar.gz
Algorithm Hash digest
SHA256 22eeee67cde87ff178b61e5ee2607b2ae5cc694f0b3e6eabd0158f4d8389e6a8
MD5 da66b5832b6c712b3d30bdc5c65be297
BLAKE2b-256 b285780675f83e6e07d8f4ab037ee9db2c981d3ab7da160b9bffafa9130bfc4c

See more details on using hashes here.

File details

Details for the file trrex-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: trrex-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.0

File hashes

Hashes for trrex-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2f20144dfe549d2b64add8b24e681895d6ebe10d17f27d15f4cdfa212186ff47
MD5 12403f4a28ae80167c2c991a73be8538
BLAKE2b-256 d4b3097b58de339af08d12d5e2b5c73ca33684e91be3d8a32428ea23d557d776

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page