A python LL(k) parser with a twist where tokens can be any arbitrary objects.
Project description
Pyllk
A python LL(k) parser with a twist where tokens can be any arbitrary objects. The current implementation uses the backtracking algorithm. I'll eventually implement predictive parsing but the API is designed to remain backward compatible.
Important: It is not a tokenizer. It comes with a very basic string tokenizer that assumes a single character is a token. The whole point is to be able to use a third party tokenizer so that you can match them to terminal tokens. Use TokenStream
to push the tokenized list into the parser.
Installation
pip install pyllk
Example. A simple calculator
This is an example of the typical calculator. In this case, we use characters as tokens. The idea is to parse things like ((10+2)*15)/(7-2)
and be able to pull the result.
Here are the steps involved in the example:
- Create the function that converts terminal token definitions into instances of instances of pyllk.TerminalToken
- Create the fnuction that converts action definitions into functions to execute
- Create the EBNF grammar for the calculator
- Execute
Convert terminal token definitions
In this simple text-based example, terminal tokens are simply a single character. The matches
function
simply needs to execute the regular expression to determine if a give character matches this terminal. The token
definition is set as JSON in the EBNF grammar. The attributes are completely arbitrary and you just need to use
them here. In this example, we chose to use a single attribute called regex
that describes the regular expression.
import os
import re
import pyllk
from pyllk.parser import Context, Parser
from pyllk.token import TerminalToken
class CharacterTerminalToken(TerminalToken):
def __init__(self, config):
self.regex = f"^{config['regex']}$"
super(CharacterTerminalToken, self).__init__(self.regex)
def matches(self, obj):
return True if re.search(self.regex, f"{obj}") else False
def character_tokenizer(definition: dict) -> CharacterTerminalToken:
return CharacterTerminalToken(definition)
Convert action definition (i.e. dict) into appropriate functions
In this case, we use 5 functions. One that pushes numbers into the stack and four for the various operators we want to support.
The action definition will be specified as JSON in the EBNF grammar. It's completely arbitrary, in this case, we chose to use
the fn
attribute to indicate which function we're trying to execute. It then returns the corresponding function as an action to
execute when the production rule matches. In order to do this, it uses the context object (an instance of Context, aka a dict) to
carry the state of the FSM.
def make_action(definition: dict) -> None:
def push_number(ec):
s = ""
for token in ec.tokens:
s = s + token.representation
ec.context['stack'].append(float(s))
def add(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} + {}".format(a, b))
e.context['stack'].append(a + b)
def sub(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} - {}".format(a, b))
e.context['stack'].append(a - b)
def mul(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} x {}".format(a, b))
e.context['stack'].append(a * b)
def div(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} / {}".format(a, b))
e.context['stack'].append(a / b)
if definition["fn"] == 'push_number':
return push_number
elif definition["fn"] == 'add':
return add
elif definition["fn"] == 'sub':
return sub
elif definition["fn"] == 'mul':
return mul
elif definition["fn"] == 'div':
return div
else:
return None
The grammar itself
This defines a bunch of terminal tokens. As stated above, it uses JSON with a single attribute called regex
. Similarly, actions are
describes as JSON with a single attribute called fn
. See comments in the above sections. Actions are executed when the production rule they annotate resolves.
DIGIT = { regex: '\d' }
LPAREN = { regex: '\(' }
RPAREN = { regex: '\)' }
PLUS = { regex: '\+' }
MINUS = { regex: '\-' }
MULT = { regex: '\*' }
DIV = { regex: '\/' }
parser : calc
number : DIGIT number
| DIGIT
expr : LPAREN calc RPAREN
| number {'fn': 'push_number'}
calc : expr PLUS expr {'fn': 'add'}
| expr MINUS expr {'fn': 'sub'}
| expr MULT expr {'fn': 'mul'}
| expr DIV expr {'fn': 'div'}
Execute
grammar = pyllk.load(path_to_grammar_file, token_fn=character_tokenizer, action_fn=make_action)
context = Context()
context.stack = []
parser = Parser(grammar)
parser.parse_string("((10+2)*15)/(7-2)", context)
result = context.stack.pop()
print(result)
Dev setup
Run make init
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyllk-0.0.12.tar.gz
.
File metadata
- Download URL: pyllk-0.0.12.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d1b2fd29fafcb48fcd792831946dd9eaecda6a5f56a5a6190fde27a405efd87 |
|
MD5 | df48101ac603d17a2cb60c6183169b8e |
|
BLAKE2b-256 | 599163498266388313694c10add4c78a5ef667a9d49d90810265831d5eb0385e |
File details
Details for the file pyllk-0.0.12-py3-none-any.whl
.
File metadata
- Download URL: pyllk-0.0.12-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9735d8f7d708cdf450695f6ac229597c584ad9419de797b28c17ed5d2de298d9 |
|
MD5 | eb19fd997602fbd9c69e9887b22a4a25 |
|
BLAKE2b-256 | 7e8593e4c9f875196d8bfa11e544c344717a5192f29a4403635f8c1474f89ee4 |