Skip to main content

A pure Python implementation of parser combinators with asymptotically best performance and support for context-sensitive or ambiguous grammars.

Project description

Tests

PyPi

Read the Docs

Introduction

Compynator is a tiny (~400 SLOCs), pure Python implementation of parser combinators. With this library, one can build up a complex parser from primitive parsers such as “get one token” (the One parser), or compose parsers together such as “if this parser fails, try that parser” (the Or combinator), etc. The mental model is a binary tree of execution nodes through which a sequence of input tokens flows.

The implementation in this package supports optional memoization and curtailment as described in [Frost2007]. This allows a memoized parser to achieve asymptotically best performance, and support ambiguous grammars.

Compynator is not an official Google product.

Examples

An example that solves https://leetcode.com/problems/parsing-a-boolean-expression:

t = Terminal('t').value(True)
f = Terminal('f').value(False)

e = Forward()
n = Terminal('!(').then(e).value(lambda x: not x).skip(')')

a_empty = Succeed(True)
a_e_tail = Forward()
a_e_tail.is_(
    Terminal(',').then(e).then(
    a_e_tail, lambda x, y: x and y) | a_empty)
a = Terminal('&(').then(e).then(
    a_e_tail, lambda x, y: x and y).skip(')')

o_empty = Succeed(False)
o_e_tail = Forward()
o_e_tail.is_(
    Terminal(',').then(e).then(
    o_e_tail, lambda x, y: x or y) | o_empty)
o = Terminal('|(').then(e).then(
    o_e_tail, lambda x, y: x or y).skip(')')

e.is_(t | f | n | a | o)

self.assertEqual(e('!(f)'), {Result(True, '')})
self.assertEqual(e('|(f,t)'), {Result(True, '')})
self.assertEqual(e('&(t,f)'), {Result(False, '')})
self.assertEqual(e('|(&(t,f,t),!(t))'), {Result(False, '')})

See test_benchmark.py for a literal translation of the ABNF rules for URI parsing as given in [RFC3986] into a structured tree:

Uri
    Scheme http
    HierPart
        Authority
            UserInfo None
            Host www.ics.uci.edu
            Port None
        Path /pub/ietf/uri/
    Query query
    Fragment fragment

There are more examples in the tests directory.

Translating Augmented Backus-Naur Form (ABNF)

Care should be taken to ensure that repeat takes all possible results instead of greedily parsing for the most. For example, the ABNF *3"a" "aa" cannot be simply translated as Terminal('a').repeat(0, 3) + 'aa'. This translation will fail to parse aaaa because it greedily matches the first 3 a’s, then fails to find the remaining 2 a’s. The correct translation should be Terminal('a').repeat(0, 3, take_all=True) + 'aa'.

ABNF

Compynator

Terminal

Terminal

Rule

Parser

Concatenation

p1 + p2

Alternative

p1 | p2

Sequence group

Normal use of parentheses

Variable repetition

p.repeat(lower, upper, take_all)

Specific repetition

p.repeat(x, x)

Optional

p.repeat(0, 1, take_all)

Translating Parsing Expression Grammar (PEG)

Unlike ABNF, PEG operators are always greedy. When translating PEG, we do not need to worry about backtracking the repetitions with take_all.

PEG

Compynator

Terminal

Terminal

Nonterminal

Parser

Epsilon

Empty

Sequence

p1 + p2

Ordered choice

p1 | p2

Zero or more

p.repeat()

One or more

p.repeat(1)

Optional

p.repeat(0, 1)

And predicate

Lookahead(p)

Not predicate

Lookahead(p, take_if=False)

Combinator vs Generator

Advantages

Advantages of parser combinators versus parser generators are:

  1. Readability. A grammar can be expressed in a very similar form as its BNF. The code can be considered an executable specification of the grammar.

  2. Simple setup. The code is the grammar. There is no need to run a generator to regenerate code when the grammar changes.

  3. Understandability. Each parser is generally short and simple that its correctness can be easily verified. There is no need to look into generated code, or the code of the parser generator.

  4. Parser combinators support context-sensitive grammars. For example, to parse an XML body, assuming start parses a start tag, body parses the body, and end parses a specified end tag:

    xml_tag = start.then(lambda tag_name: body.skip(end(tag_name)))
  5. Combination of lexing and parsing. Most parser generators perform their lexing and parsing phases separately. Parser combinators combine these phases together. Hence they are not limited to string inputs. The example (in test_core.py) below takes a tokenized sequence.

    NUM, OP, TERMINAL = 0, 1, 2
    tokens = [(NUM, 2), (OP, operator.add), (NUM, 10),
              (OP, operator.mul), (NUM, 4)]
    num = One.where(lambda c: c[0] == NUM)
    op = One.where(lambda c: c[0] == OP).value(lambda c: c[1])
    mult_div = op.where(lambda c: c in (operator.mul, operator.truediv))
    add_sub = op.where(lambda c: c in (operator.add, operator.sub))
    left_paren = One.where(lambda c: c[0] == TERMINAL and c[1] == '(')
    right_paren = One.where(lambda c: c[0] == TERMINAL and c[1] == ')')
    expr = Forward()
    factor = (
        num.value(lambda t: t[1]) |
        left_paren.then(expr).skip(right_paren)
    )
    def do_op(left, op, right):
        return op(left, right)
    term = Forward()
    term.is_((
        Collect(term, mult_div, factor).value(lambda v: do_op(*v)) ^
        factor
    ).memoize())
    expr.is_((
        Collect(expr, add_sub, term).value(lambda v: do_op(*v)) ^
        term
    ).memoize())
    calc = expr.filter(lambda r: not r.remain)
    self.assertEqual(
            set(expr(tokens)),
            {
                Result(value=42, remain=[]),
                Result(value=12, remain=tokens[3:]),
                Result(value=2, remain=tokens[1:]),
            })
    self.assertEqual(calc(tokens), Succeed(42)([]))

Disadvantages

Disadvantages of parser combinators are:

  1. Familiarity. Most textbooks write about parser generators and traditional parsing techniques such as LL, LR, etc. Parser combinators are more common in functional and logic programming communities, as popularized by [Wadler1985] and [Hutton1992].

  2. Coupling of code and grammar. The downside of simple setup is a tight coupling of code and grammar, which might make it difficult to understand.

  3. As it is implemented here, performance might be impacted due to composition overhead. See test_benchmark.py for details. On the same machine, the result for URI parsing could be ~70 times slower:

    t.test_parse_uri() 903.5961110000001 usec per run
    t.test_urlparse() 13.704007000000074 usec per run
  4. All the advantages and disadvantages of scannerless parsing apply too.

Limitations

Currently, this library does not implement:

  1. Source context such as line and column number of the token.

  2. “Greedy” matching in the same sense as in regular expression (i.e. longest match). The greedy operation in this library is in the “greedy algorithm” sense, i.e. the first rule that matches will be taken.

  3. Space treatments. Spaces have to be explicitly taken care of in grammars.

References

[Wadler1985]

Wadler, Philip. (1985). “How to replace failure by a list of successes”. Proc. conference on functional programming and computer architecture. Springer–Verlag.

[Hutton1992]

Hutton, Graham. (1992). “Higher-order functions for parsing”. Journal of functional programming, 2(3), 323–343.

[Frost2007]

Frost R.A., Hafiz R., Callaghan P. (2007) “Parser Combinators for Ambiguous Left-Recursive Grammars”. In: Hudak P., Warren D.S. (eds) “Practical Aspects of Declarative Languages”. PADL 2008. Lecture Notes in Computer Science, vol 4902. Springer, Berlin, Heidelberg

[RFC3986]

Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax”, STD 66, RFC 3986, January 2005.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compynator-0.1.2.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

compynator-0.1.2-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file compynator-0.1.2.tar.gz.

File metadata

  • Download URL: compynator-0.1.2.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.9 Linux/5.4.0-1031-azure

File hashes

Hashes for compynator-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e8a3958b891b983e581b0f3d7ad48c72ac75f4113b75840e670221712618ee42
MD5 7936f6f0fe02abb92d7ec80579b7148c
BLAKE2b-256 5ba03aa924b19f4672c16feea78a337c5e1f9626f6357b5b2445daa713a440e5

See more details on using hashes here.

File details

Details for the file compynator-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: compynator-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.9 Linux/5.4.0-1031-azure

File hashes

Hashes for compynator-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 45d463d928cddc0d7451df6c0eee1595d371ab3d4ff27592cb34b2bf43a59985
MD5 21458afee184e9cdccabe42c43746153
BLAKE2b-256 213b363704988c6d43c8112e0c5924875627584cc133552822ff5fed7e0fa31d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page