Skip to main content

Utilities for Regular Expressions in Python

Project description

Regex Utils

A set of utils to work with Regular Expressions in Python

WARNING: most functions are still under development, please report any issues you find, or feel free to contribute!

Main Features

  • Generate strings that match a given regex
  • Intersect two regular expressions
  • Negate a regular expression (experimental)
  • Convert a NFA back into a regex

Use Python native sre_parse to parse a regex, then convert it into a Nondeterministic Finite Automaton

Once you have this NFA, you can use all the features in this package. The regex module offers an abstraction for NFA-based operations.

Getting Started

First of all, install the package:

pip install regex-utils

Then, you can use all functions in the regex module by importing it:

from regex_utils import regex


print(regex.from_string("abc").generate_sample())
# Output: "abc"

String Generation

Generate strings that match the given regex, by performing a random walk over the generated NFA.

regex.from_string("[a-z]{5}").generate_sample()
# Output: "xfkdy"

Intersection

Intersect two regular expressions.

This is the equivalent of having to match both regex, one after the other. The advantage is that you can now use the to_string function to generate the resulting regular expression.

This is also useful if you want to “compile” lookarounds into the regex itself.

Finally, you can use intersection to generate strings of arbitrary length. For example, if you had the ab+c* regex, and you wanted a minimum length of 3 characters and a maximum length of 10 character for your string, you could intersect the regex like this:

r = regex.intersect("ab+c*", ".{3,10}")

You could then generate strings using the resulting regex:

r.generate_sample()
# Returns: abc   -- Warning: this is not deterministic

To get the final regex, you can also use the to_string function

regex.intersect("a*b*", "\w{5}").to_string()
# Returns: '(?:(?:(?:(?:[ab]b|aa)b|aaa)b|aaaa)b|aaaaa)'

Please notice that Intersections could result in empty regex. For example, if you tried to intersect two regex with nothing in common, you would receive an empty regex as a result.

regex.intersect("[a-z]", "[^a-z]").to_string()
# Returns: ''

Negation (experimental)

Complement the original NFA by converting accepting states to non-accepting states, and add all missing transitions (this is used to generate random strings). The logic is similar to a Negative Lookahead in PCRE.

WARNING: this feature is currently experimental, and it contains some known bugs for specific scenarios.

Please open a issue if you find any specific bugs related to this feature 🙏

To String

Once you have the resulting NFA, you can get the regex back in plain text, so that you can use it in other tools.

This could be useful for example if you wanted to get the intersection of two regex, or the negation of one, and put it in your application.

regex.intersect("a*b*", "\w{5}").to_string()
# Returns: '(?:(?:(?:(?:[ab]b|aa)b|aaa)b|aaaa)b|aaaaa)'

An interesting use case is converting Lookaheads into a regex that can be used in non-PCRE compliant engines, such as the native ones in Go or Rust. This specific feature is also a work in progress for this package. For example, if you wanted strings that match the (?!abc).* regex, you could write it like this:

regex.negate("abc").to_string()
# Returns: '(?:(?:[^a]|a(?:[^b]|b[^c])))(?:\\.)*|(?:ab?)?'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regex_utils-0.1.1.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

regex_utils-0.1.1-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file regex_utils-0.1.1.tar.gz.

File metadata

  • Download URL: regex_utils-0.1.1.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.0

File hashes

Hashes for regex_utils-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8f9b1c389047c2a0bff2b17e0c17d807f9dca866041a35dad8ea83ec30fb8a31
MD5 7402e61e1bd62cde07aa231c43f3fdb2
BLAKE2b-256 6658e229ce093b1a326fc9922cf411467d78529dcfc94f72d7daa54806270efd

See more details on using hashes here.

File details

Details for the file regex_utils-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: regex_utils-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.0

File hashes

Hashes for regex_utils-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e0f771e9ded7dca0e1914afc9055689ebed6c55867e81574fb4b766bc6faa879
MD5 b2158bbae885a98bf73712f5ebda0eb1
BLAKE2b-256 316326790684b0f1a888dea66ac4d5843084d8b6e147aafce6f613b1ac4846f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page