Skip to main content

Compose multiple regular expressions into one using a special syntax

Project description

What it is

A regular expression builder that manages cross-pattern references and allows for easy runtime mutability by default

Features

Simple syntax

The syntax for cross-referencing regular expressions is simple, and defined by a small (customizable) regular expression, which by default matches text like (?~name)

Automatic reference management

When using a PatternComposer, patterns are automatically tracked, so that when one changes, all the patterns that depend on it are recompiled as well

Why & priorities

When working on my programming language, Caustic, I was displeased at most of the parsers I found, and decided to make my own. For the parsing of regular expressions, I needed support for runtime modification due to Caustic's variable syntax. As such, arbitrary modifications can be made to the patterns by modifying either the pattern itself, or patterns it depends on.

Expression syntax

Sub parts are denoted like (?~name) by default, but can be changed to any regular expression. As such, the following syntax and examples are only valid with the default part-pattern.

Name Pattern
part_a \d
part_b [a-zA-Z]
patt (?~part_a):(?~part_b)

Given the table of patterns above, the pattern patt compiles to:

\d:[a-zA-Z]

If part_a was changed to -?\d, then patt would recompile to:

-?\d:[a-zA-Z]

Examples

Note that examples are given with the default matching pattern, so references will be made as (?~name)

/examples/pragma-parser-2.py

The following script runs a REPL that will match each line of input against a "pragma" statement that can change the patterns on-the-fly

#> Imports
import re
import regex_compose
#</Imports

#> Header
base_patterns = {
    'word': r'\w',
    'number': r'\d',
    'wordnum': r'(?~word)|(?~number)',
}

pragma_patterns = {
    'pragma.start': r'^\$',
    'pragma.stop': r'$',
    'pragma.key': r'(?P<key>(?:(?~wordnum)|\.)+)',
    'pragma.delim': r':',
    'pragma.val': r'(?P<val>.*?)',
    'pragma.flags': r'(?m)',
    'pragma': r'(?~pragma.flags)(?~pragma.start)(?~pragma.key)(?~pragma.delim)(?~pragma.val)(?~pragma.stop)',
}
#</Header

#> Main >/
# Create parser and add `base_patterns` and `pragma_patterns`
p = regex_compose.PatternComposer()
p.multiadd(base_patterns)
p.multiadd(pragma_patterns)

print(f'Pragma pattern: {rc.patterns["pragma"]}\n -> {rc.compiled["pragma"]}')
while True:
    inp = input('Enter text to parse > ')
    if m := re.match(p['pragma'], inp):
        rc.add(m.group('key'), m.group('val'), replace=True)
        print(p.compiled['pragma'])
    else: print(inp) # echo the line back when there's no pragma
Input Output
N/A Pragma pattern: (?~pragma.flags)(?~pragma.start)(?~pragma.key)(?~pragma.delim)(?~pragma.val)(?~pragma.stop)
N/A -> (?m)^\$(?P<key>(?:\w|\d|\.)+):(?P<val>.*?)$
test test
$pragma.start:% (?m)%(?P<key>(?:\w|\d|\.)+):(?P<val>.*?)$
%pragma.stop:; (?m)%(?P<key>(?:\w|\d|\.)+):(?P<val>.*?);
%pragma.delim:=; (?m)%(?P<key>(?:\w|\d|\.)+)=(?P<val>.*?);

Further examples

For further examples, see https://codeberg.org/Shae/RegExCompose/src/branch/main/examples

Changelog

v0.0.3

  • Fixed an issue in PatternComposer.multiadd() due to incorrect name access
  • Fixed refpatt_patt not being passed to get_refpatt_parts in PatternComposer.multiadd(), causing an error when in bytes_mode

v0.0.2

  • Added support for bytes

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regex-compose-0.0.3.tar.gz (6.6 kB view hashes)

Uploaded Source

Built Distribution

regex_compose-0.0.3-py3-none-any.whl (6.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page