Skip to main content

PRegEx - Programmable Regular Expressions

Project description

Python Version MIT License

What is PRegEx?

Let's face it, although RegEx is without a doubt an extremely useful tool, its syntax has been repeatedly proven to be quite hard for people to read and to memorize. This is mainly due to RegEx's declarative nature, which many programmers are not familiar with, as well as its extensive use of symbols that do not inherently relate to their functionality within a RegEx pattern, thus making them hard to remember. To make matters even worse, RegEx patterns are more often than not tightly packed with large amounts of information, which our brains just seem to be struggling to break down in order to analyze effectively.

For all those reasons, building even a simple RegEx pattern for matching URLs can be quite a painful task for many people. This is where PRegEx comes in! PRegEx, which stands for Programmable Regular Expressions, is a Python package that can be used in order to construct Regular Expression patterns in a more human-friendly way. Through the use of PRegEx, one is able to fully utilize the powerful tool that is RegEx without having to deal with any of its nuisances that seem to drive people crazy! PRegEx achieves that by offering the following:

  1. An easy-to-remember syntax that resembles the good ol' imperative way of programming!
  2. Adds modularity to building RegEx patterns, as one can easily break down a complex pattern into simpler sub-patterns which can then be combined together.
  3. No longer having to escape meta characters such as "." and "*" as this is handled internally by PRegEx!
  4. Acts as a higher-level API on top of Python's built-in "re" module, providing access to its core functionality while saving you the trouble of having to deal with "re.Match" instances.
  5. No matter how complex the abstraction, it's always just a pure RegEx pattern that sits underneath which you can fetch and use any way you like!

Installation

You can start using PRegEx by installing it via pip. Note that "pregex" requires Python >= 3.9.

pip install pregex

Usage example

In PRegEx, everything is a Programmable Regular Expression, or "Pregex" for short. This makes it easy for simple Pregex instances to be combined into more complex ones! Within the code snippet below, we construct a Pregex instance that will match any URL that ends with either ".com" or ".org" as well as any IP address for which a 4-digit port number is specified. Furthermore, in the case of a URL, we would like for its domain name to be separately captured as well.

from pregex.quantifiers import Optional, AtLeastOnce, AtLeastAtMost
from pregex.classes import AnyButWhitespace, AnyButFrom, AnyDigit
from pregex.groups import CapturingGroup
from pregex.tokens import Backslash
from pregex.operators import Either
from pregex.pre import Pregex

pre: Pregex = \
        Optional("http" + Optional('s') + "://") + \
        Either(
            Optional("www.") +
            CapturingGroup(
                AtLeastOnce(AnyButWhitespace() | AnyButFrom(":", Backslash()))
            ) +
            Either(".com", ".org"),

            3 * (AtLeastAtMost(AnyDigit(), min=1, max=3) + ".") +
            1 * AtLeastAtMost(AnyDigit(), min=1, max=3) +
            ":" + 4 * AnyDigit() 
        )

We can then easily fetch the resulting Pregex instance's underlying RegEx pattern.

regex = pre.get_pattern()

This is the pattern that we just built. Yikes!

(?:https?\:\/\/)?(?:(?:www\.)?([^\\:\s]+)(?:\.com|\.org)|(?:\d{1,3}\.){3}\d{1,3}\:\d{4})

Besides from having access to its underlying pattern, we can use a Pregex instance to find matches within a string. Consider for example the following piece of text:

text = "text--192.168.1.1:8000--text--http://www.wikipedia.orghttps://youtube.com--text"

We can scan the above string for any possible matches by invoking the instance's "get_matches" method:

matches = pre.get_matches(text)

Looks like there were three matches:

['192.168.1.1:8000', 'http://www.wikipedia.org', 'https://youtube.com']

Likewise, we can invoke the instance's "get_groups" method to get any captured groups.

groups = pre.get_groups(text)

As expected, there were only two captured groups since the first match is not a URL and therefore it does not contain a domain name to be captured.

[(None,), ('wikipedia',), ('youtube',)]

You can learn more about how PRegEx works by visiting the PRegEx Documentation Page.

What to expect next?

Currently, the pregex package's core modules are still being built. In the near future, more modules will follow that will rely upon the package's core modules in order to provide abstractions for even more complex RegEx patterns!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pregex-1.1.1.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pregex-1.1.1-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file pregex-1.1.1.tar.gz.

File metadata

  • Download URL: pregex-1.1.1.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pregex-1.1.1.tar.gz
Algorithm Hash digest
SHA256 08df6b14fc7ea05210baa3a5f26bc96cbdd0e057199d9646475a3178509296cd
MD5 965b3c73474a427132f6d3efe7bd4034
BLAKE2b-256 3a36e39a35818e5bf1f2c16417de5ab2004afb932878a78cfa2d0152705c9aa8

See more details on using hashes here.

File details

Details for the file pregex-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: pregex-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for pregex-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2ae25e798b88413ad719fc705d638d9d779a69fb7145db392b42ce07ba8bd534
MD5 115be57e709d5c213ac7370049ea505f
BLAKE2b-256 b1fb7cd7900117755488cd792f64b0c3b3f06e4b7c8121b3260520e296fbffca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page