Compose multiple regular expressions into one using a special syntax
Project description
What it is
A regular expression builder that manages cross-pattern references and allows for easy runtime mutability by default
Features
Simple syntax
The syntax for cross-referencing regular expressions is simple, and defined by a small
(customizable) regular expression, which by default matches text like (?~name)
Automatic reference management
When using a PatternComposer
, patterns are automatically tracked, so that when one
changes, all the patterns that depend on it are recompiled as well
Why & priorities
When working on my programming language, Caustic, I was displeased at most of the parsers I found, and decided to make my own. For the parsing of regular expressions, I needed support for runtime modification due to Caustic's variable syntax. As such, arbitrary modifications can be made to the patterns by modifying either the pattern itself, or patterns it depends on.
Expression syntax
Sub parts are denoted like (?~name)
by default, but can be changed to any regular expression.
As such, the following syntax and examples are only valid with the default part-pattern.
Name | Pattern |
---|---|
part_a |
\d |
part_b |
[a-zA-Z] |
patt |
(?~part_a):(?~part_b) |
Given the table of patterns above, the pattern patt
compiles to:
\d:[a-zA-Z]
If part_a
was changed to -?\d
, then patt
would recompile to:
-?\d:[a-zA-Z]
Examples
Note that examples are given with the default matching pattern, so references will be made as (?~name)
/examples/pragma-parser-2.py
The following script runs a REPL that will match each line of input against a "pragma" statement that can change the patterns on-the-fly
#> Imports
import re
import regex_compose
#</Imports
#> Header
base_patterns = {
'word': r'\w',
'number': r'\d',
'wordnum': r'(?~word)|(?~number)',
}
pragma_patterns = {
'pragma.start': r'^\$',
'pragma.stop': r'$',
'pragma.key': r'(?P<key>(?:(?~wordnum)|\.)+)',
'pragma.delim': r':',
'pragma.val': r'(?P<val>.*?)',
'pragma.flags': r'(?m)',
'pragma': r'(?~pragma.flags)(?~pragma.start)(?~pragma.key)(?~pragma.delim)(?~pragma.val)(?~pragma.stop)',
}
#</Header
#> Main >/
# Create parser and add `base_patterns` and `pragma_patterns`
p = regex_compose.PatternComposer()
p.multiadd(base_patterns)
p.multiadd(pragma_patterns)
print(f'Pragma pattern: {rc.patterns["pragma"]}\n -> {rc.compiled["pragma"]}')
while True:
inp = input('Enter text to parse > ')
if m := re.match(p['pragma'], inp):
rc.add(m.group('key'), m.group('val'), replace=True)
print(p.compiled['pragma'])
else: print(inp) # echo the line back when there's no pragma
Input | Output |
---|---|
N/A | Pragma pattern: (?~pragma.flags)(?~pragma.start)(?~pragma.key)(?~pragma.delim)(?~pragma.val)(?~pragma.stop) |
N/A | -> (?m)^\$(?P<key>(?:\w|\d|\.)+):(?P<val>.*?)$ |
test |
test |
$pragma.start:% |
(?m)%(?P<key>(?:\w|\d|\.)+):(?P<val>.*?)$ |
%pragma.stop:; |
(?m)%(?P<key>(?:\w|\d|\.)+):(?P<val>.*?); |
%pragma.delim:=; |
(?m)%(?P<key>(?:\w|\d|\.)+)=(?P<val>.*?); |
Further examples
For further examples, see https://codeberg.org/Shae/RegExCompose/src/branch/main/examples
Changelog
v0.0.3
- Fixed an issue in
PatternComposer.multiadd()
due to incorrect name access - Fixed
refpatt_patt
not being passed toget_refpatt_parts
inPatternComposer.multiadd()
, causing an error when inbytes_mode
v0.0.2
- Added support for bytes
Links
- Source code: https://codeberg.org/Shae/RegExCompose
- License: http://www.apache.org/licenses/LICENSE-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for regex_compose-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41db2ba33dcdf5a83e135ba0f9e4dcfb32af6c309d3f44048b4995d6d5b3b1d0 |
|
MD5 | 642c27ce615f5b2d0cd69764df126f78 |
|
BLAKE2b-256 | 973863d029357f12423ac9051210f458056937b6ab718832ae8949dda3c60052 |