A minimalistic PEG parser

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Parsik

A simplistic PEG (parsing expression grammar) parser.

You provide a grammar for your language, and this module gives you a Parser that you can use to recognize if documents (strings) match the grammar. You can attach callback functions for when various parts of the grammar match, and to help construct some "output" of the parse (e.g. a parse tree).

The library is simple and unoptimized in time and space. And a grammar with left-recursion could put the parser into an infinite-loop. But the implementation is small and straightforward, and there is extensive logging to help analyze the parser's behaviour.

If you're looking for something even slightly more complex or elegant, I recommend parsy (https://github.com/python-parsy/parsy).

Example

from parsik import Parser, ParseFailure,\
                   Any, Char, EOF, Fail, OneOrMore, Optional, R,\
                   Regex, Sequence, Times, ZeroOrMore, silent

# A grammar for 7- or 10-digit phone numbers.
grammar = {
    'PHONENUM': Sequence(
                    Optional('AREACODE'),
                    'THREE', silent(Char('-')), 'FOUR'
                ),
    'AREACODE': Sequence(
                    silent(Char('(')), 'THREE', silent(Regex(r'\) ')),
                    on_match=lambda x: x[0]
                ),
    'THREE':    Regex(r'\d{3}'),    # three digits
    'FOUR':     Regex(r'\d{4}'),    # four digits
}
parser = Parser(grammar, "Phone numbers")

output = parser.parse("PHONENUM", "123-4567")
assert output == [ '123', '4567']

output = parser.parse("PHONENUM", "(555) 123-4567")
assert output == [ '555', '123', '4567']

try:
    parser.parse("PHONENUM", "123-456")
except ParseFailure:
    pass
else:
    assert False

Matchers

Parsik comes with several "Matchers" that you can use to express your grammar: components like Sequence, Optional, ZeroOrMore, Regex, etc. These Matchers can be composed and can refer to other rules in the grammar; even recursively, so long as they aren't "left-recursive" which could cause an infinite-loop.

The included Matchers are:

Char: matches a single character.
Regex: matches a regular expression.
EOF: matches the end of the input string.
R: matches a rule that is defined elsewhere in the grammar.
- but usually you can just use a string to refer to a rule by name.
Optional: optionally matches an item.
Any: matches any one of a number of possible items.
Sequence: matches a sequence of items in a specific order.
Times: matches an item a specific number of times, or range of times.
ZeroOrMore: matches an item zero or more times.
OneOrMore: matches an item one or more times.
Fail: if an item matches, then abort the parse.

You can also make your own types of Matchers.

There isn't any special syntax or operator-overloading for expressing the grammar. It's just a dictionary where the keys are rule-names, and the values are Matchers. Some Matchers, like Sequence, are composed of other matchers or rules. They just take the list of child matchers in their constructor. You can also refer to other grammar rules by name, even if it forms a cycle.

Logging

Parsik uses python's standard logging. If you enable it at DEBUG level:

logging.basicConfig(level=logging.DEBUG, format='%(message)s')

then you'll get an elaborate analysis of any parse attempt. For example:

grammar = {
    'MAIN': Sequence('FIRST', 'SECOND', 'THIRD'),
    'FIRST': ZeroOrMore(Regex(r'ab')),
    'SECOND': Any(Char('x'), Char('y')),
    'THIRD': 'FIRST',
}
p = Parser(grammar, "Quick example")
p.parse("MAIN", "ababy")

This will log:

Parsing Quick example {MAIN} against input string 'ababy'.

pos     input      output              ✓|✕ rule            rule details
------- ---------- ------------------- --- --------------- -------------------------
0       'a'                                MAIN            [[{FIRST},{SECOND},{THIRD}]]
0       'a'                                                  {FIRST}
0       'a'                                FIRST               *(r'ab')
0       'a'                                                      r'ab'
0-2     'ab'       'ab'                 ✓                        r'ab'
2       'a'                                                      r'ab'
2-4     'ab'       'ab'                 ✓                        r'ab'
4       'y'                                                      r'ab'
4       ''                              ✕                        r'ab'
0-4     'abab'     '[ab, ab]'           ✓  FIRST               *(r'ab')
0-4     'abab'     '[ab, ab]'           ✓                    {FIRST}
4       'y'                                                  {SECOND}
4       'y'                                SECOND              any(['x', 'y'])
4       'y'                                                      'x'
4       ''                              ✕                        'x'
4       'y'                                                      'y'
4-$     'y'        'y'                  ✓                        'y'
4-$     'y'        'y'                  ✓  SECOND              any(['x', 'y'])
4-$     'y'        'y'                  ✓                    {SECOND}
5       '$'                                                  {THIRD}
5       '$'                                THIRD               {FIRST}
5       '$'                                FIRST                 *(r'ab')
5       '$'                                                        r'ab'
5-$     '$'                             ✕                          r'ab'
5-$     '$'        '[]'                 ✓  FIRST                 *(r'ab')
5-$     '$'        '[]'                 ✓  THIRD               {FIRST}
5-$     '$'        '[]'                 ✓                    {THIRD}
0-$     'a'        '[[ab, ab], y, []]'  ✓  MAIN            [[{FIRST},{SECOND},{THIRD}]]
Parse success: '[[ab, ab], y, []]'.

Each row shows a step in the recursive-descent parse attempt. A Matcher will log a line immediately before and immediately after it attempts to match. So, one line where the ✓|✕ column is blank, and another where it has the result.

The columns are:

pos: The position(s) (character #s) in the input document under view.
- $ means it has reached the end.
input: The text of the input document at those positions.
output: If a matcher was successful, this is its output.
✓|✕:
- blank means that the matcher is going to attempt a match.
- ✓ means that the match was successful.
- ✕ means that the match failed.
rule: The name of the rule, if the matcher is a named rule in the grammar.
rule details: A concise representation of what the matcher will attempt to recognize. They display as:
- Char: 'x'
- Regex: r'pattern'
- EOF: EOF
- R: {rule name}
- Optional: ?(...)
- Any: any(...)
- Sequence: [...]
- Times: X{min,max}(...)
- ZeroOrMore: *(...)
- OneOrMore: +(...)
- Fail: Fail(...)

License

This package is licensed under the GNU GPL v3.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.9.3

Apr 29, 2019

0.9.2

Apr 29, 2019

0.9.1

Apr 28, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsik-0.9.3.tar.gz (14.6 kB view details)

Uploaded Apr 29, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parsik-0.9.3-py3-none-any.whl (30.9 kB view details)

Uploaded Apr 29, 2019 Python 3

File details

Details for the file parsik-0.9.3.tar.gz.

File metadata

Download URL: parsik-0.9.3.tar.gz
Upload date: Apr 29, 2019
Size: 14.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.3

File hashes

Hashes for parsik-0.9.3.tar.gz
Algorithm	Hash digest
SHA256	`c43829bd01bacd1f74d57dca3e01cc8ffe7f05e2e5712b24cc9f1afbead562b7`
MD5	`8b1a211098000be218361e1c3dabf70b`
BLAKE2b-256	`2d1695324b36249cd946dd33e23130a6e6ba4c95e495e22e38baa2ee7c635b0f`

See more details on using hashes here.

File details

Details for the file parsik-0.9.3-py3-none-any.whl.

File metadata

Download URL: parsik-0.9.3-py3-none-any.whl
Upload date: Apr 29, 2019
Size: 30.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.3

File hashes

Hashes for parsik-0.9.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a93ab56fd2b7944bb53326ac5451d6672ed5084f9600f0053d61d68f5dfd3755`
MD5	`a604ec0bf958b56fc0065fa36207a227`
BLAKE2b-256	`c889eb4a15777cfcced325297a1999beff8def4f87987ad01bff6b774a70de97`

See more details on using hashes here.

parsik 0.9.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Parsik

Example

Matchers

Logging

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes