Skip to main content

A Python library that provides a convenient interface for compiling and matching regular expression patterns using the re2 library.

Project description

re2shield

re2shield is a Python library that provides a shield for working with regular expressions using the re2 module.

It allows you to compile and search for patterns in text using the powerful regular expression engine provided by re2.

It allows you to hide the complexity of regular expressions and work with pattern identifiers instead.

This project utilizes the google/re2 library, which is licensed under the BSD 3-Clause License. Please refer to the LICENSE file of google/re2 for more information.

Installation

Before installing re2shield, make sure that re2 is installed on your system. You can install re2 by following the instructions on the google/re2 GitHub repository.

Alternatively, you can use the re2-installer.sh script located in the package/installed directory. This script automates the installation process for re2 and its dependencies. Simply run the script using the following command:

git clone https://github.com/Npc-coder/re2shield.git
cd re2shield/installed
sh re2-installer.sh

You can install re2shield using pip:

pip install re2shield

or

git clone https://github.com/Npc-coder/re2shield.git
cd re2shield
pip install .

Updates

Version 0.1.5

  • Reverted the change that automatically assigned IDs to patterns. Now the compile method requires both a list of regular expressions and a list of corresponding IDs, allowing users to specify the ID for each pattern.
  • Improved the compile method to check for ID duplication among both new and existing patterns in the database. If a duplicate ID is found and overwrite is False, a ValueError is raised.

Please refer to the Usage section for examples of how to use these new features.

Usage

Importing the Library

Here is a simple example demonstrating how to use re2shield:

import re2shield

if __name__ == "__main__":
    db = re2shield.Database()

    # Load patterns from file
    try:
        db = re2shield.load('patterns.pkl')
        print(db)  # Prints the number of patterns in the database
    except FileNotFoundError:
        # If pattern file doesn't exist, compile the patterns
        patterns = [
            (r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', 1),
            (r'\b\d{3}[-.\s]??\d{3}[-.\s]??\d{4}\b', 2),
            (r'\d+', 3)
        ]

        expressions, ids = zip(*patterns)
        db.compile(expressions=expressions, ids=ids, overwrite=False)
        print(db)  # Prints the number of patterns in the database
        db.dump('patterns.pkl')

    # Find patterns in text
    def match_handler(id, from_, to, flags, context):
        print(f"Match found for pattern {id} from {from_} to {to}: {context}")

    db.scan('test@ex12ample12.com', match_handler)

In this example, we create a re2shield.Database object, compile a list of patterns with their corresponding identifiers, and then search for those patterns in the provided text.

The match_handler function is called for each match found, allowing you to process the matches as desired.

Use Case

One of the key advantages of re2shield is its ability to hide the actual regular expression patterns from users during distribution. By compiling the patterns with re2 and using pattern identifiers, you can distribute your code without exposing the underlying regular expression logic. This provides an additional layer of abstraction and enhances the security of your regular expression patterns.

Features

  • Hide the complexity of regular expressions
  • Work with pattern identifiers instead of exposing the actual regular expression patterns
  • Compile patterns for efficient matching
  • Find all occurrences of patterns in text
  • Customize the handling of matches using callback functions

License

This project is licensed under the BSD 3-Clause License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

re2shield-0.1.5.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

re2shield-0.1.5-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file re2shield-0.1.5.tar.gz.

File metadata

  • Download URL: re2shield-0.1.5.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for re2shield-0.1.5.tar.gz
Algorithm Hash digest
SHA256 a116be97649dd7b9c9eeb6ce5cf59bae137c6c55a26824c16cf0eda106de5cdb
MD5 3ee63d2a543376e4234f8b15023d0851
BLAKE2b-256 f480d393b8d1c00f88a350e7285ce71fce09399975ae0b2d8e8a480f0f3eed93

See more details on using hashes here.

File details

Details for the file re2shield-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: re2shield-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for re2shield-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 26fac375f257914c839ffc4bddd0ea040a2a80ce540f26eee7209345568f0153
MD5 9ea980237e4e42df80868b8a83d7f076
BLAKE2b-256 08c43fe4e4d163a066b255a60936a060404e38c97fddb139eb6cc733c5412001

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page