Skip to main content

PRegEx - Programmable Regular Expressions

Project description

Python Version MIT License

What is PRegEx?

Let's face it, although RegEx is without a doubt an extremely useful tool, its syntax has been repeatedly proven to be quite hard for people to read and to memorize. This is mainly due to RegEx's declarative nature, which many programmers are not familiar with, as well as its extensive use of symbols that do not inherently relate to their functionality within a RegEx pattern, thus making them easy to forget. To make matters even worse, RegEx patterns are more often than not tightly packed with large amounts of information, which our brains just seem to be struggling to break down in order to analyze effectively. For these reasons, building even a simple RegEx pattern for matching URLs can be quite a painful task for many people.

This is where PRegEx comes in! PRegEx, which stands for Programmable Regular Expressions, is a Python package that can be used in order to construct Regular Expression patterns in a more human-friendly way. Through the use of PRegEx, one is able to fully utilize the powerful tool that is RegEx without having to deal with any of its nuisances that seem to drive people crazy! PRegEx achieves that by offering the following:

  1. An easy-to-remember syntax that resembles the good ol' imperative way of programming!
  2. No longer having to group patterns or escape meta characters, as both are handled internally by PRegEx!
  3. Modularity to building RegEx patterns, as one can easily break down a complex pattern into multiple simpler ones which can then be combined together.
  4. A higher-level API on top of Python's built-in "re" module, providing access to its core functionality and more, while saving you the trouble of having to deal with "re.Match" instances.

And remember, no matter how complex the abstraction, it's always just a pure RegEx pattern that sits underneath which you can fetch and use any way you like!

Installation

You can start using PRegEx by installing it via pip. Note that "pregex" requires Python >= 3.9.

pip install pregex

Usage example

In PRegEx, everything is a Programmable Regular Expression, or "Pregex" for short. This makes it easy for simple Pregex instances to be combined into more complex ones! Within the code snippet below, we construct a Pregex instance that will match any URL that ends with either ".com" or ".org" as well as any IP address for which a 4-digit port number is specified. Furthermore, in the case of a URL, we would like for its domain name to be separately captured as well.

from pregex.core.classes import AnyLetter, AnyDigit, AnyFrom
from pregex.core.quantifiers import Optional, AtLeastAtMost
from pregex.core.operators import Either
from pregex.core.groups import Capture
from pregex.core.pre import Pregex

# Define main sub-patterns.
http_protocol = Optional('http' + Optional('s') + '://')

www = Optional('www.')

alphanum = AnyLetter() | AnyDigit()

domain_name = \
    alphanum + \
    AtLeastAtMost(alphanum | AnyFrom('-', '.'), n=1, m=61) + \
    alphanum

tld = '.' + Either('com', 'org')

ip_octet = AnyDigit().at_least_at_most(n=1, m=3)

port_number = 4 * AnyDigit()

# Combine sub-patterns together.
pre: Pregex = \
    http_protocol + \
    Either(
        www + Capture(domain_name) + tld,
        3 * (ip_octet + '.') + ip_octet + ':' + port_number
    )

We can then easily fetch the resulting Pregex instance's underlying RegEx pattern.

regex = pre.get_pattern()

This is the pattern that we just built. Yikes!

(?:https?:\/\/)?(?:(?:www\.)?([A-Za-z\d][A-Za-z\d\-.]{1,61}[A-Za-z\d])\.(?:com|org)|(?:\d{1,3}\.){3}\d{1,3}:\d{4})

Besides from having access to its underlying pattern, we can use a Pregex instance to find matches within a piece of text. Consider for example the following string:

text = "text--192.168.1.1:8000--text--http://www.wikipedia.org--text--https://youtube.com--text"

By invoking the instance's "get_matches" method, we are able to scan the above string for any possible matches:

matches = pre.get_matches(text)

Looks like there were three matches:

['192.168.1.1:8000', 'http://www.wikipedia.org', 'https://youtube.com']

Likewise, we can invoke the instance's "get_captures" method to get any captured groups.

groups = pre.get_captures(text)

As expected, there were only two captured groups since the first match is not a URL and therefore it does not contain a domain name to be captured.

[(None,), ('wikipedia',), ('youtube',)]

Finally, you might have noticed that we built our pattern by utilizing various classes that were imported from modules under pregex.core. These modules contain classes through which the RegEx syntax is essentially replaced. However, PRegEx also includes another set of modules, namely those under subpackage pregex.meta, whose classes build upon those in pregex.core so as to provide numerous pre-built patterns that you can just import and use right away!

from pregex.core.pre import Pregex
from pregex.core.classes import AnyDigit
from pregex.core.operators import Either
from pregex.meta.essentials import HttpUrl, IPv4

port_number = 4 * AnyDigit()

pre: Pregex = Either(
    HttpUrl(capture_domain=True),
    IPv4() + ":" + port_number
)

By using classes found within the pregex.meta subpackage, we were able to construct more or less the same pattern as before only much more easily!

You can learn more about PRegEx by visiting the PRegEx Documentation Page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pregex-2.1.1.tar.gz (38.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pregex-2.1.1-py3-none-any.whl (40.4 kB view details)

Uploaded Python 3

File details

Details for the file pregex-2.1.1.tar.gz.

File metadata

  • Download URL: pregex-2.1.1.tar.gz
  • Upload date:
  • Size: 38.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for pregex-2.1.1.tar.gz
Algorithm Hash digest
SHA256 9ba75d61dbb9464397373edc1cae4e3b1b9097a19ee578029acd30d01cfbf7e0
MD5 b9a0de1d21a1b9d561c43357e993c4a2
BLAKE2b-256 f045f72be9d899db24d87ffc9050e9cff6bf9d883d860e2aaf17023437d1c74b

See more details on using hashes here.

File details

Details for the file pregex-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: pregex-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 40.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for pregex-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 646b52a6ff20fbea04afb7648f66e6f102861726d4ac02e987d7eef1c4c07583
MD5 fa0e9fea5d11a9b719b120e823c74ec9
BLAKE2b-256 d132996af17e5c2b0c87b838df029d30e160e048ede9d269373202947d5abe44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page