Skip to main content

PRegEx - Programmable Regular Expressions

Project description

Python Version MIT License

What is PRegEx?

Let's face it, although RegEx is without a doubt an extremely useful tool, its syntax has been repeatedly proven to be quite hard for people to read and to memorize. This is mainly due to RegEx's declarative nature, which many programmers are not familiar with, as well as its extensive use of symbols that do not inherently relate to their functionality within a RegEx pattern, thus making them hard to remember. To make matters even worse, RegEx patterns are more often than not tightly packed with large amounts of information, which our brains just seem to be struggling to break down in order to analyze effectively.

For all those reasons, building even a simple RegEx pattern for matching URLs can be quite a painful task for many people. This is where PRegEx comes in! PRegEx, which stands for Programmable Regular Expressions, is a Python package that can be used in order to construct Regular Expression patterns in a more human-friendly way. Through the use of PRegEx, one is able to fully utilize the powerful tool that is RegEx without having to deal with any of its nuisances that seem to drive people crazy! PRegEx achieves that by offering the following:

  1. An easy-to-remember syntax that resembles the good ol' imperative way of programming!
  2. Adds modularity to building RegEx patterns, as one can easily break down a complex pattern into simpler sub-patterns which can then be combined together.
  3. No longer having to escape meta characters such as "." and "*" as this is handled internally by PRegEx!
  4. Acts as a higher-level API on top of Python's built-in "re" module, providing access to its core functionality while saving you the trouble of having to deal with "re.Match" instances.
  5. No matter how complex the abstraction, it's always just a pure RegEx pattern that sits underneath which you can fetch and use any way you like!

Installation

You can start using PRegEx by installing it via pip. Note that "pregex" requires Python >= 3.9.

pip install pregex

Usage example

In PRegEx, everything is a Programmable Regular Expression, or "Pregex" for short. This makes it easy for simple Pregex instances to be combined into more complex ones! Within the code snippet below, we construct a Pregex instance that will match any URL that ends with either ".com" or ".org" as well as any IP address for which a 4-digit port number is specified. Furthermore, in the case of a URL, we would like for its domain name to be separately captured as well.

from pregex.quantifiers import Optional, AtLeastOnce, AtLeastAtMost
from pregex.classes import AnyButWhitespace, AnyButFrom, AnyDigit
from pregex.groups import CapturingGroup
from pregex.tokens import Backslash
from pregex.operators import Either
from pregex.pre import Pregex

pre: Pregex = \
        Optional("http" + Optional('s') + "://") + \
        Either(
            Optional("www.") +
            CapturingGroup(
                AtLeastOnce(AnyButWhitespace() | AnyButFrom(":", Backslash()))
            ) +
            Either(".com", ".org"),

            3 * (AtLeastAtMost(AnyDigit(), min=1, max=3) + ".") +
            1 * (AtLeastAtMost(AnyDigit(), min=1, max=3) + ":") +
            4 * AnyDigit() 
        )

We can then easily fetch the resulting Pregex instance's underlying RegEx pattern.

regex = pre.get_pattern()

This is the pattern that we just built. Yikes!

(?:https?\:\/\/)?(?:(?:www\.)?([^\\:\s]+)(?:\.com|\.org)|(?:\d{1,3}\.){3}\d{1,3}\:\d{4})

Besides from having access to its underlying pattern, we can use a Pregex instance to find matches within a string. Consider for example the following piece of text:

text = "text--192.168.1.1:8000--text--http://www.wikipedia.orghttps://youtube.com--text"

We can scan the above string for any possible matches by invoking the instance's "get_matches" method:

matches = pre.get_matches(text)

Looks like there were three matches:

['192.168.1.1:8000', 'http://www.wikipedia.org', 'https://youtube.com']

Likewise, we can invoke the instance's "get_groups" method to get any captured groups.

groups = pre.get_groups(text)

As expected, there were only two captured groups since the first match is not a URL and therefore it does not contain a domain name to be captured.

[(None,), ('wikipedia',), ('youtube',)]

You can learn more about how PRegEx works by visiting the PRegEx Documentation Page.

What to expect next?

Currently, the pregex package's core modules are still being built. In the near future, more modules will follow that will rely upon the package's core modules in order to provide abstractions for even more complex RegEx patterns!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pregex-1.2.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pregex-1.2.0-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file pregex-1.2.0.tar.gz.

File metadata

  • Download URL: pregex-1.2.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pregex-1.2.0.tar.gz
Algorithm Hash digest
SHA256 424a4ed6bf9e5d671eaf06cad0ea46d187eca7d406fcdbbd1984a540491e03dc
MD5 61ac02cba693254465d5323aa058c54d
BLAKE2b-256 430cb44d3b1e83b12f5246717d1a9b003cd0c2338f66fd1d59f9e42b05809819

See more details on using hashes here.

File details

Details for the file pregex-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pregex-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pregex-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a34c0a0a9868aa2dbd4849fa998c8ab26e55192349bd71ca45e22522946eee67
MD5 6ea8df93b32e18af43140bf4e95990a5
BLAKE2b-256 da83426e8bc14ac143a209dcdf8109f030f7944934134d5ff803baed80484845

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page