Skip to main content

A python LL(k) parser with a twist where tokens can be any arbitrary objects.

Project description

build coverage PyPI version

Pyllk

A python LL(k) parser with a twist where tokens can be any arbitrary objects. The current implementation uses the backtracking algorithm. I'll eventually implement predictive parsing but the API is designed to remain backward compatible.

Installation

pip install pyllk

Example. A simple calculator

This is an example of the typical calculator. In this case, we use characters as tokens. The idea is to parse things like ((10+2)*15)/(7-2) and be able to pull the result.

For now, the grammar cannot be expressed in BNF notation so the production rules have to be created in code. It'll change in the future but using non-character tokens complicate things with BNF when representing terminals.

There are 3 steps involved in the example:

  1. Create actions that get executed when production rules are matched. Actions are optional but often necessary if you want to do anything useful with what you've parsed. The function must accept a single parameter of type ExecutionContext which will give you access to:

    • The parser. That's useful if you want to access global configurations like the log level.
    • The production rule. That's the rule that was just matched.
    • The list of tokens that where matched to the production rule.
    • The context given to the parser at the beginning. This is a dictionary containing anything you want to keep around. The example below put a stack inside it to keep track of parentheses in the expression. You can put anything you need in there.
  2. Create the production rules. A grammar is made of production rules that derives terminal and non-terminal tokens. If you don't know what I'm talking about here, stop right now and go read this first: https://en.wikipedia.org/wiki/Context-free_grammar.

  3. Execute.

    • Take the production rules and construct a grammar and a parser.
    • Push your input into the parser.
    • Collect the results in the context object.

Create actions

def action_make_number(ec):
    s = ""
    for token in ec.tokens:
        s = s + token.representation
    ec.context['stack'].append(float(s))


def add(e):
    b = e.context['stack'].pop()
    a = e.context['stack'].pop()
    e.parser.log("Ex: {} + {}".format(a, b))
    e.context['stack'].append(a + b)


def sub(e):
    b = e.context['stack'].pop()
    a = e.context['stack'].pop()
    e.parser.log("Ex: {} - {}".format(a, b))
    e.context['stack'].append(a - b)


def mul(e):
    b = e.context['stack'].pop()
    a = e.context['stack'].pop()
    e.parser.log("Ex: {} x {}".format(a, b))
    e.context['stack'].append(a * b)


def div(e):
    b = e.context['stack'].pop()
    a = e.context['stack'].pop()
    e.parser.log("Ex: {} / {}".format(a, b))
    e.context['stack'].append(a / b)

Production rules

calculator_rules_with_context = []
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("+"), NonTerminalToken("expr")], add))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("-"), NonTerminalToken("expr")], sub))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("/"), NonTerminalToken("expr")], div))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("*"), NonTerminalToken("expr")], mul))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("expr"), [TerminalToken("("), NonTerminalToken("INIT"), TerminalToken(")")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("expr"), [NonTerminalToken("number")], action_make_number))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("0"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("1"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("2"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("3"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("4"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("5"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("6"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("7"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("8"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("9"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("0"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("2"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("3"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("4"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("5"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("6"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("7"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("8"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("9"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("")]))

Create a grammar and a parser

from pyllk.grammar import Grammar
from pyllk.parser import Parser

g = Grammar(calculator_rules_with_context)
parser = Parser(g)

context = {
    'stack': []
}
parser.parse_string("((10+2)*15)/(7-2)", context)
result = context['stack'].pop()

result will contain 36.0

Non-character input

Although character-tokens are the default, nothing prevents you from creating non-character tokens. Here is an example of terminal tokens that are meant to be fed by spaCy, a python NLP library.

import json

class SpacyToken(TerminalToken):
    def __init__(self):
        super().__init__(representation = {})

    def text(self, text):
        self.representation['TEXT'] = text
        return self

    def lemma(self, text):
        self.representation['LEMMA'] = text
        return self

    def dependency(self, text):
        self.representation['DEP'] = text
        return self

    def part_of_speech(self, text):
        self.representation['POS'] = text
        return self

    def tag(self, text):
        self.representation['TAG'] = text
        return self

    def entity(self, text):
        self.representation['ENTITY'] = text
        return self


    def matches(self, obj):
        match = True

        if 'TEXT' in self.representation:
            match &= (self.representation['TEXT'] == obj.text)

        if 'LEMMA' in self.representation:
            match &= (self.representation['LEMMA'] == obj.lemma_)

        if 'DEP' in self.representation:
            match &= (self.representation['DEP'] == obj.dep_)

        if 'POS' in self.representation:
            match &= (self.representation['POS'] == obj.pos_)

        if 'TAG' in self.representation:
            match &= (self.representation['TAG'] == obj.tag_)

        if 'ENTITY' in self.representation:
            match &= (self.representation['ENTITY'] == obj.ent_type_)

        return match

    def __str__(self):
        return json.dumps(self.representation)

When you write your production rules, you can use the SpacyToken class where before you'd use the default TerminalToken:

rules.append(ProductionRule(NonTerminalToken('SOME_PRODUCTION_RULE'), [SpacyToken().dependency("pobj")], record))

Dev setup

Run make init

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyllk-0.0.6.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

pyllk-0.0.6-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file pyllk-0.0.6.tar.gz.

File metadata

  • Download URL: pyllk-0.0.6.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1

File hashes

Hashes for pyllk-0.0.6.tar.gz
Algorithm Hash digest
SHA256 71da8ecca1b0e92c2ea514dd08f86aa1f3c76ec03ceb5b4e0a9e7fb5f6eaff16
MD5 cba1a16d734327763bd1fbf051348e90
BLAKE2b-256 ede50891acc9b067da0c8376814bcff2bc72b9a8627c242a9c40397c09a903f6

See more details on using hashes here.

File details

Details for the file pyllk-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pyllk-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1

File hashes

Hashes for pyllk-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b09eefa5b6d82b09f87ed3fd4d6b38b5ee8081870d1e781d34d6b6a6212ace85
MD5 d8dc2581c24c5e1614d70188df756dc8
BLAKE2b-256 1f67b9c19fea93bc42e314bbc8e5bf4577eb12b12905530554b34a5b2968cdc4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page