A wrapper for Python's re library for advanced regex pattern management
Project description
A wrapper for Python’s re library for advanced regex pattern management
Basic usage
The Engine loads Regular Expression pattern templates written in *.json files from the provided directory, builds and compiles them in the following fashion:
example of template models/dates.json:
{ "day": [ "3[01]", "[12][0-9]", "0?[1-9]" ], "month": [ "0?[1-9]", "1[012]" ], "year": [ "\\d{4}" ], "date": [ "{{day}}/{{month}}/{{year}}", "{{year}}-{{month}}-{{day}}" ], "patterns": [ "{{date}}" ] }
will result in the following regex:
(?P<date_0>(?P<day_0>[12][0-9]|0?[1-9]|3[01])/(?P<month_0>0?[1-9]|1[012])/(?P<year_0>\d{4})|(?P<year_1>\d{4})-(?P<month_1>0?[1-9]|1[012])-(?P<day_1>[12][0-9]|0?[1-9]|3[01]))
It is possible to query as follows:
engine = Engine('models') for match in parser.parse("Look at this date: 2012-20-10"): print(match) # <[Match date] span(19, 29): 2012-12-10> date = match.group('date') print(date) # <[Group date_0] span(19, 29): 2012-12-10> day = date.group('day') print(day) # <[Group day_1] span(27, 29): 10> month = date.group('month') print(month) # <[Group month_1] span(24, 26): 12> year = date.group('year') print(year) # [Group year_1] span(19, 23): 2012>
Match objects have the following attributes:
type: the type of match (e.g. “dates”);
match: the re.match object;
re: the regex pattern;
all_group_names: the name of all the children groups;
Both Match and Group objects have the following attributes:
value: the string value of the match/group
start: the beginning of the match/group relative to the input string
end: the end of the group relative to the input string
offset (start, end)
length (end-start)
Group objects have the following attributes:
name: the actual group name (e.g. date_1);
key: the group key (e.g. date);
Both Match and Group objects can be serialized in dicts with the serialize() method and to a json string with the json attribute
Secondary features
There are two useful secondary features:
non-capturing groups: these are specified by using the “!” prefix in the group name
dynamic backreferences: use # to reference a previous group and @<n> to specify how many groups behind
template:
{ "!number": [ "\\d" ], "abg": [ "alpha", "beta", "gamma" ], "patterns": [ "This is an unnamed number group: {{number}}.", "I can match {{abg}} and {{abg}}, and then re-match the last {{#abg}} or the second last {{#abg@2}}" ] }
It will generate the following regexs:
This is an unnamed number group: (?:\d).
I can match (?P<abg_0>alpha|gamma|beta) and (?P<abg_1>alpha|gamma|beta), and then re-match the last (?P=abg_1) or the second last (?P=abg_0)
N.B.: in order to obtain an escape char, such as \d, in the pattern’s model it must be double escaped: \\d
Current limitations
None known
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file replus-0.0.1.tar.gz
.
File metadata
- Download URL: replus-0.0.1.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
ac803768a16a2cd9a9b8e327130b9953bdbe03907b52ff32e5bc40508dcbe977
|
|
MD5 |
b39d5af30b9628b1de3e9e9408806a11
|
|
BLAKE2b-256 |
a0c8594b70fa36e1777d3a43353cd1d501f2f86851b82740e4e18a512aab418b
|
File details
Details for the file replus-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: replus-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
14fc7e7aeb793a29b49adb1952ce013178787851cac54a1640c0116c34810d80
|
|
MD5 |
573a7f2988eb04c31342fcb05ed6938b
|
|
BLAKE2b-256 |
b6373c0ad76049b1d8c622f5545b7a59a58b57e5f91003224007e8e229e39951
|