Skip to main content

Humane grammar library

Project description

Build status Code Health https://coveralls.io/repos/github/yanivmo/ruler/badge.svg?branch=master

Ruler is a lightweight regular expressions wrapper aiming to make regex definitions more modular, intuitive, readable and the mismatch reporting more informative.

Quick start

Let’s implement the following grammar, given in EBNF:

grammar = who, ' likes to drink ', what;
who = 'John' | 'Peter' | 'Ann' | 'Paul' | 'Rachel';
what = tea | juice;
juice = 'juice';
tea = 'tea', [milk];
milk = ' with milk';

Using ruler it looks almost identical to EBNF:

>>> class Morning(Grammar):
...     who = OneOf('John', 'Peter', 'Ann', 'Paul', 'Rachel')
...     juice = Rule('juice')
...     milk = Optional(' with milk')
...     tea = Rule('tea', milk)
...     what = OneOf(juice, tea)
...     grammar = Rule(who, ' likes to drink ', what, '\.')
...
... morning = Morning.create()

A member named grammar must be always present - it acts as the start rule. Let’s begin rather with a mismatch:

>>> morning.match('John likes to drink coffee')
False

match() returns True if the match was successful and False otherwise. One of the major advantages of ruler, as opposed to working directly with regular expressions, is the ability to know exactly what went wrong:

>>> print(morning.error.long_description)
Mismatch at 20:
  John likes to drink coffee
                      ^
"coffee" does not match "juice"
"coffee" does not match "tea"

Let’s fix our text:

>>> morning.match('John likes to drink tea.')
True

Any rule that is declared as a member variable of your grammar class acts as a named capture group arranged hierarchically. Use matched attribute to retrieve the text matched by a specific rule:

>>> morning.matched
'John likes to drink tea.'
>>> morning.who.matched
'John'
>>> morning.what.matched
'tea'

Branches of OneOf rules that didn’t match and optional rules that didn’t match have None as their values making it easy to ask whether they matched:

>>> morning.what.juice.matched is None
True
>>> morning.what.tea.matched is None
False
>>> morning.what.tea.milk.matched is None
True

Rules can be reused multiple times. If the same rule appears multiple times under the same parent, these rules are collected into a list:

>>> class Morning(Grammar):
...     person = OneOf('John', 'Peter', 'Ann', 'Paul', 'Rachel')
...     who = Rule(person, Optional(', ', person), Optional(' and ', person))
...     juice = Rule('juice')
...     milk = Optional(' with milk')
...     tea = Rule('tea', milk)
...     what = OneOf(juice, tea)
...     grammar = Rule(who, ' like', Optional('s'), ' to drink ', what, '\.')
...
... morning = Morning.create()
... morning.match('Peter, Rachel and Ann like to drink juice.')
True
>>> morning.who.matched
'Peter, Rachel and Ann'
>>> morning.who.person[0].matched
'Peter'
>>> morning.who.person[1].matched
'Rachel'
>>> morning.who.person[2].matched
'Ann'

Notice that, in the grammar above, person rule is never a direct child of who but still is accessed as such. That is because when a rule hierarchy is built, a rule is placed under its closest named ancestor.

Rules’ string arguments may actually be any valid regular expression. So we could rewrite our grammar like this:

>>> class Morning(Grammar):
...     who = OneOf('\w+')
...     juice = Rule('juice')
...     milk = Optional(' with milk')
...     tea = Rule('tea', milk)
...     what = OneOf(juice, tea)
...     grammar = Rule(who, ' likes to drink ', what, '\.')
...
... morning = Morning()
... morning.match('R2D2 likes to drink juice. And nothing else matters.')
True
>>> morning.matched
'R2D2 likes to drink juice.'
>>> morning.who.matched
'R2D2'

Performance

The library is well optimized for fast matching. Nevertheless it is important to remember that this is a Python wrapper of the regex library and as such can never outperform matching directly using the regex library. Currently ruler measures approximately ten times slower than re.

Development

  • To run the tests:

    pytest tests
  • To compare the performance to the re library:

    python performance/re_compare.py
  • To run performance profiling of a specific method, Rule.match for example:

    python performance/profile.py Rule.match

    More than one method can be specified in the same command.

Tox

Tox takes care of everything without installing anything manually. There are two groups of tox environments: py*-test and py*-profile. The test environments run the unit tests while the profile environments run the performance profiling scripts. If tox is not enough then a development environment can be generated by creating a new virtualenv and then running pip install -r requirements_develop.txt.

Dependency management

For the development needs, there are three requirements files in the project’s root directory:

  • requirements_test.txt contains all the dependencies needed to run the unit tests,

  • requirements_profile.txt contains all the dependencies needed to run the performance profiling,

  • requirements_develop.txt contains the testing dependencies, the profiling dependencies and some additional dependencies used in development.

The requirements files mentioned above are not intended for manual editing. Instead they are managed using pip-tools. The process of updating the requirements is as follows:

  1. Add, remove or update a dependency in one of the reqs_*.dep files:

    • Update reqs_install.dep if the dependency is needed for the regular installation by the end user,

    • Update reqs_test.dep if the dependency is needed to run the unit tests but is not necessary for the regular installation,

    • Update reqs_profile.dep if the dependency is needed to run the performance profiling but is not necessary for the regular installation,

    • Update reqs_develop.dep if the dependency is not in one of the previous categories.

  2. Generate the requirements file running pip-compile. The exact command is documented in the beginning of each requirements file.

  3. Consider running pip-sync requirements_develop.txt.

Notice that there is no need to edit setup.py - it will pull the dependencies by itself from reqs_install.dep.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruler-2.0.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

ruler-2.0.0-py2.py3-none-any.whl (11.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ruler-2.0.0.tar.gz.

File metadata

  • Download URL: ruler-2.0.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ruler-2.0.0.tar.gz
Algorithm Hash digest
SHA256 dcf4dc19054fa04a73c24b62e93dfecf17487406e409d2c66225de7738ee6e12
MD5 2a40c1d5f460c1c635af5652146b1f3c
BLAKE2b-256 ede5026162295cec03bd2ff0c47404748f582eb43907ab4dac89435f273af965

See more details on using hashes here.

File details

Details for the file ruler-2.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for ruler-2.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f541871592c6c8c468ecfe4135a09fa32df60c834ea8a5306d13cc853ff22b08
MD5 49765b1c4af3c09bcf4761f00fabf745
BLAKE2b-256 373fa29995476026ee662962a6a10c486f4c4f0d29ef91e8b791ad495933f413

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page