Skip to main content

Utility functions for writing readable regular expressions in a hierarchical way

Project description

The goal of reagex (from “readable regular expression”) is to suggest a way for writing complex regular expressions with many capturing groups in a readable way.

At the moment, it contains just one very simple function (called reagex) and an utility function, but any function which could be useful for writing readable patterns is welcome.

Note: Publishing this ridiculously small project is an excuse to familiarize with python packaging, DevOps tools and the entire workflow behind the publication of an open-source project. The project template was generated using https://github.com/ionelmc/cookiecutter-pylibrary/ which is obviously an overkill for a “one-function-project”.

  • Free software: BSD 2-Clause License

Usage

The core function reagex is just a wrapper of str.format and it works in the same way. See the example

import re
from reagex import reagex

# A sloppy pattern for an italian address (just to show how it works)
pattern = reagex(
    '{_address}, {postcode} {city} {province}',
    # groups starting with "_" are non-capturing
    _address = reagex(
        '{street} {number}',
        street = '(via|contrada|c/da|c[.]da|piazza|p[.]za|p[.]zza) [a-zA-Z]+',
        number = 'snc|[0-9]+'
    ),
    postcode = '[0-9]{5}',
    city = '[A-Za-z]+',
    province = '[A-Z]{2}'
)

matcher = re.compile(pattern)
match = matcher.fullmatch('via Roma 123, 12345 Napoli NA')
print(match.groupdict())

# prints:
#   {'city': 'Napoli',
#    'number': '123',
#    'postcode': '12345',
#    'province': 'NA',
#    'street': 'via Roma'}

Groups starting by '_' are non-capturing. The rest are all named capturing groups.

Why not…

Why not using just re.VERBOSE?

I think reagex is easier to write and to read:

  • with reagex, you first describe the structure of the pattern in terms of groups, then you provide a pattern for each group; with re.VERBOSE you have to define the groups in the exact position they must be matched: to get the high-level structure of the pattern you may need to read multiple lines at the same indentation level

  • with re.VERBOSE you just write a big string; with reagex you get syntax highlighting which helps readability

  • white-spaces don’t need any special treatment

  • “{group_name}” is nicer than “(?P<group_name>)”

Installation

pip install reagex

Documentation

https://python-reagex.readthedocs.io/

Development

Possible improvements:

  1. make some meaningful use of the format_spec in {group_name:format_spec}

  2. add utility functions like repeated to help writing common patterns in a readable way

Testing

To run all the tests:

tox

Note, to combine the coverage data from all the tox environments run:

Windows

set PYTEST_ADDOPTS=--cov-append
tox

Other

PYTEST_ADDOPTS=--cov-append tox

Changelog

0.1.2 (2018-12-16)

  • Fix little mistake in the example (which is showed in PyPI, so a release was necessary to update the PyPI page).

0.1.1 (2018-12-12)

  • Minor fixes and modifications to documentation

0.1.0 (2018-12-08)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reagex-0.1.2.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

reagex-0.1.2-py2.py3-none-any.whl (5.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file reagex-0.1.2.tar.gz.

File metadata

  • Download URL: reagex-0.1.2.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for reagex-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8d820d3b0bf2cc6fc14081d66d5deeff7a2cdd38a98851af04094f68fa2bdeaa
MD5 b28e0730dd5540b43d407a94492d11d9
BLAKE2b-256 5343a02d3cf3f02b967c83ce8928fa463f6430fa4afff812a523fdf158422b54

See more details on using hashes here.

File details

Details for the file reagex-0.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: reagex-0.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for reagex-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 44edb7325d04ede2889b2b64c80a06254bd75d24038868561c63b428ea5070b4
MD5 e3fbbfe6408d518135079afd636146b6
BLAKE2b-256 e4f9996cb53eee8082620b6184c3de91af56bc5e16919ca0229de481c86d2a01

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page