Skip to main content

A python LL(k) parser with a twist where tokens can be any arbitrary objects.

Project description

build coverage PyPI version

Pyllk

A python LL(k) parser with a twist where tokens can be any arbitrary objects. The current implementation uses the backtracking algorithm. I'll eventually implement predictive parsing but the API is designed to remain backward compatible.

Important: It is not a tokenizer. It comes with a very basic string tokenizer that assumes a single character is a token. The whole point is to be able to use a third party tokenizer so that you can match them to terminal tokens. Use TokenStream to push the tokenized list into the parser.

Installation

pip install pyllk

Example. A simple calculator

This is an example of the typical calculator. In this case, we use characters as tokens. The idea is to parse things like ((10+2)*15)/(7-2) and be able to pull the result.

Here are the steps involved in the example:

  1. Create the function that converts terminal token definitions into instances of instances of pyllk.TerminalToken
  2. Create the fnuction that converts action definitions into functions to execute
  3. Create the EBNF grammar for the calculator
  4. Execute

Convert terminal token definitions

In this simple text-based example, terminal tokens are simply a single character. The matches function simply needs to execute the regular expression to determine if a give character matches this terminal. The token definition is set as JSON in the EBNF grammar. The attributes are completely arbitrary and you just need to use them here. In this example, we chose to use a single attribute called regex that describes the regular expression.

import os
import re

import pyllk
from pyllk.parser import Context, Parser
from pyllk.token import TerminalToken


class CharacterTerminalToken(TerminalToken):
    def __init__(self, config):
        self.regex = f"^{config['regex']}$"
        super(CharacterTerminalToken, self).__init__(self.regex)

    def matches(self, obj):
        return True if re.search(self.regex, f"{obj}") else False


def character_tokenizer(definition: dict) -> CharacterTerminalToken:
    return CharacterTerminalToken(definition)

Convert action definition (i.e. dict) into appropriate functions

In this case, we use 5 functions. One that pushes numbers into the stack and four for the various operators we want to support. The action definition will be specified as JSON in the EBNF grammar. It's completely arbitrary, in this case, we chose to use the fn attribute to indicate which function we're trying to execute. It then returns the corresponding function as an action to execute when the production rule matches. In order to do this, it uses the context object (an instance of Context, aka a dict) to carry the state of the FSM.

def make_action(definition: dict) -> None:
    def push_number(ec):
        s = ""
        for token in ec.tokens:
            s = s + token.representation
        ec.context['stack'].append(float(s))

    def add(e):
        b = e.context['stack'].pop()
        a = e.context['stack'].pop()
        e.parser.log("Ex: {} + {}".format(a, b))
        e.context['stack'].append(a + b)

    def sub(e):
        b = e.context['stack'].pop()
        a = e.context['stack'].pop()
        e.parser.log("Ex: {} - {}".format(a, b))
        e.context['stack'].append(a - b)

    def mul(e):
        b = e.context['stack'].pop()
        a = e.context['stack'].pop()
        e.parser.log("Ex: {} x {}".format(a, b))
        e.context['stack'].append(a * b)

    def div(e):
        b = e.context['stack'].pop()
        a = e.context['stack'].pop()
        e.parser.log("Ex: {} / {}".format(a, b))
        e.context['stack'].append(a / b)

    if definition["fn"] == 'push_number':
        return push_number
    elif definition["fn"] == 'add':
        return add
    elif definition["fn"] == 'sub':
        return sub
    elif definition["fn"] == 'mul':
        return mul
    elif definition["fn"] == 'div':
        return div
    else:
        return None

The grammar itself

This defines a bunch of terminal tokens. As stated above, it uses JSON with a single attribute called regex. Similarly, actions are describes as JSON with a single attribute called fn. See comments in the above sections. Actions are executed when the production rule they annotate resolves.

DIGIT = { regex: '\d' }
LPAREN = { regex: '\(' }
RPAREN = { regex: '\)' }

PLUS = { regex: '\+' }
MINUS = { regex: '\-' }
MULT = { regex: '\*' }
DIV = { regex: '\/' }

parser : calc

number : DIGIT number
       | DIGIT

expr : LPAREN calc RPAREN
     | number                      {'fn': 'push_number'}

calc : expr PLUS expr              {'fn': 'add'}
     | expr MINUS expr             {'fn': 'sub'}
     | expr MULT expr              {'fn': 'mul'}
     | expr DIV expr               {'fn': 'div'}

Execute

grammar = pyllk.load(path_to_grammar_file, token_fn=character_tokenizer, action_fn=make_action)

context = Context()
context.stack = []
parser = Parser(grammar)

parser.parse_string("((10+2)*15)/(7-2)", context)
result = context.stack.pop()

print(result)

Dev setup

Run make init

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyllk-0.0.13.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

pyllk-0.0.13-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file pyllk-0.0.13.tar.gz.

File metadata

  • Download URL: pyllk-0.0.13.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1

File hashes

Hashes for pyllk-0.0.13.tar.gz
Algorithm Hash digest
SHA256 ef294a41f617dd8fa6fb66dce3a563b18d0189eafd45dcc8814f39553ca3edaa
MD5 89614cc44bb2781430cf554a2ae96b26
BLAKE2b-256 faa15c5c971558a1150b9497269fdfb20b1d5bb484c08f1a4945b54eb2e04c18

See more details on using hashes here.

File details

Details for the file pyllk-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: pyllk-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1

File hashes

Hashes for pyllk-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 fbe8c76b116f73f5e9ebd4a7db2044830be858eb5890aafbac03b096d1a61264
MD5 77186de021b6a13f3c19a1c8bec5d5d9
BLAKE2b-256 4c37c471f1940df2e5b97d18f42c18d5db878c9c6f80fcbd7660ccfe6ed82b75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page