A wrapper for the regex library for advanced pattern management
Project description
A wrapper for the regex library for advanced pattern management
Installation
pip install replus
or clone this repo
git@github.com:raptored01/replus.git
and then run
python setup.py install
Basic usage
The Engine loads Regular Expression pattern templates written in *.json files from the provided directory, builds and compiles them in the following fashion:
example of template models/dates.json:
{ "day": [ "3[01]", "[12][0-9]", "0?[1-9]" ], "month": [ "0?[1-9]", "1[012]" ], "year": [ "\\d{4}" ], "date": [ "{{day}}/{{month}}/{{year}}", "{{year}}-{{month}}-{{day}}" ], "patterns": [ "{{date}}" ] }
will result in the following regex:
(?P<date_0>(?P<day_0>[12][0-9]|0?[1-9]|3[01])/(?P<month_0>0?[1-9]|1[012])/(?P<year_0>\d{4})|(?P<year_1>\d{4})-(?P<month_1>0?[1-9]|1[012])-(?P<day_1>[12][0-9]|0?[1-9]|3[01]))
You can put more patterns into patterns, as it will become a list that will be looped over.
Querying
It is possible to query as follows:
from replus import Engine engine = Engine('models') for match in engine.parse("Look at this date: 2012-20-10"): print(match) # <[Match date] span(19, 29): 2012-12-10> date = match.group('date') print(date) # <[Group date_0] span(19, 29): 2012-12-10> day = date.group('day') print(day) # <[Group day_1] span(27, 29): 10> month = date.group('month') print(month) # <[Group month_1] span(24, 26): 12> year = date.group('year') print(year) # [Group year_1] span(19, 23): 2012>
Filtering
it is possible to filter regexes by type, being the type given by the json’s filename
filters = ["dates", "cities"] for match in engine.parse(my_string, *filters): # do stuff
Match and Group objects
Match objects have the following attributes:
type: the type of match (e.g. “dates”);
match: the re.match object;
re: the regex pattern;
all_group_names: the name of all the children groups;
Both Match and Group objects have the following attributes:
value: the string value of the match/group
start: the beginning of the match/group relative to the input string
end: the end of the group relative to the input string
span: (start, end) the span of the match/group object relative to the input string
offset: {"start": start, "end": end} similar to span
length: end-start
first(): get the first matching group
last(): get the last matching group
Group objects have the following attributes:
name: the actual group name (e.g. date_1);
key: the group key (e.g. date);
spans: [(start, end), ...] the spans of the repeated matches relative to the input string
starts: the beginnings of the match/group relative to the input string
ends: the ends of the group relative to the input string
offsets: [{"start": start, "end": end}, ...]
parent: The parent group object
Both Match and Group objects can be serialized in dicts with the serialize() method and to a json string with the json attribute
Secondary features
There are two useful secondary features:
non-capturing groups: these are specified by using the “?:” prefix in the group name or key
atomic groups: these are specified by using the “?>” prefix in the group name or key
dynamic backreferences: use # to reference a previous group and @<n> to specify how many groups behind
template:
{ "?:number": [ "\\d" ], "abg": [ "alpha", "beta", "gamma" ], "spam": [ "spam" ], "eggs": [ "eggs" ], "patterns": [ "This is an unnamed number group: {{number}}.", "I can match {{abg}} and {{abg}}, and then re-match the last {{#abg}} or the second last {{#abg@2}}", "Here is some {{?:spam}} and some {{?>eggs}}" ] }
It will generate the following regexs:
This is an unnamed number group: (?:\d).
I can match (?P<abg_0>alpha|beta|gamma) and (?P<abg_1>alpha|beta|gamma), and then re-match the last (?P=abg_1) or the second last (?P=abg_0)
Here is some (?:spam) and some (?>eggs)
N.B.: in order to obtain an escape char, such as \d, in the pattern’s model it must be double escaped: \\d
Current limitations
None known
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for replus-0.1.6-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37d33b3856013288b8f15d295476fd97f7b2e05daa83780f6ddf7211bd280ef7 |
|
MD5 | a703fbd6b78cd81970ef289804bf7bcc |
|
BLAKE2b-256 | f6232ee8bdf0e8fb48addc534cca92b209d853a55f8347d682495b61930fcb1c |