A python LL(k) parser with a twist where tokens can be any arbitrary objects.
Project description
Pyllk
A python LL(k) parser with a twist where tokens can be any arbitrary objects. The current implementation uses the backtracking algorithm. I'll eventually implement predictive parsing but the API is designed to remain backward compatible.
Installation
pip install pyllk
Example. A simple calculator
This is an example of the typical calculator. In this case, we use characters as tokens. The idea is to parse things like ((10+2)*15)/(7-2)
and be able to pull the result.
For now, the grammar cannot be expressed in BNF notation so the production rules have to be created in code. It'll change in the future but using non-character tokens complicate things with BNF when representing terminals.
There are 3 steps involved in the example:
-
Create actions that get executed when production rules are matched. Actions are optional but often necessary if you want to do anything useful with what you've parsed. The function must accept a single parameter of type
ExecutionContext
which will give you access to:- The parser. That's useful if you want to access global configurations like the log level.
- The production rule. That's the rule that was just matched.
- The list of tokens that where matched to the production rule.
- The context given to the parser at the beginning. This is a dictionary containing anything you want to keep around. The example below put a stack inside it to keep track of parentheses in the expression. You can put anything you need in there.
-
Create the production rules. A grammar is made of production rules that derives terminal and non-terminal tokens. If you don't know what I'm talking about here, stop right now and go read this first: https://en.wikipedia.org/wiki/Context-free_grammar.
-
Execute.
- Take the production rules and construct a grammar and a parser.
- Push your input into the parser.
- Collect the results in the
context
object.
Create actions
def action_make_number(ec):
s = ""
for token in ec.tokens:
s = s + token.representation
ec.context['stack'].append(float(s))
def add(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} + {}".format(a, b))
e.context['stack'].append(a + b)
def sub(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} - {}".format(a, b))
e.context['stack'].append(a - b)
def mul(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} x {}".format(a, b))
e.context['stack'].append(a * b)
def div(e):
b = e.context['stack'].pop()
a = e.context['stack'].pop()
e.parser.log("Ex: {} / {}".format(a, b))
e.context['stack'].append(a / b)
Production rules
calculator_rules_with_context = []
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("+"), NonTerminalToken("expr")], add))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("-"), NonTerminalToken("expr")], sub))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("/"), NonTerminalToken("expr")], div))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("INIT"), [NonTerminalToken("expr"), TerminalToken("*"), NonTerminalToken("expr")], mul))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("expr"), [TerminalToken("("), NonTerminalToken("INIT"), TerminalToken(")")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("expr"), [NonTerminalToken("number")], action_make_number))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("0"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("1"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("2"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("3"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("4"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("5"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("6"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("7"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("8"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("number"), [TerminalToken("9"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("0"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("2"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("3"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("4"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("5"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("6"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("7"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("8"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("9"), NonTerminalToken("digit_or_empty")]))
calculator_rules_with_context.append(ProductionRule(NonTerminalToken("digit_or_empty"), [TerminalToken("")]))
Create a grammar and a parser
from pyllk.grammar import Grammar
from pyllk.parser import Parser
g = Grammar(calculator_rules_with_context)
parser = Parser(g)
context = {
'stack': []
}
parser.parse_string("((10+2)*15)/(7-2)", context)
result = context['stack'].pop()
result will contain 36.0
Non-character input
Although character-tokens are the default, nothing prevents you from creating non-character tokens. Here is an example of terminal tokens that are meant to be fed by spaCy, a python NLP library.
import json
class SpacyToken(TerminalToken):
def __init__(self):
super().__init__(representation = {})
def text(self, text):
self.representation['TEXT'] = text
return self
def lemma(self, text):
self.representation['LEMMA'] = text
return self
def dependency(self, text):
self.representation['DEP'] = text
return self
def part_of_speech(self, text):
self.representation['POS'] = text
return self
def tag(self, text):
self.representation['TAG'] = text
return self
def entity(self, text):
self.representation['ENTITY'] = text
return self
def matches(self, obj):
match = True
if 'TEXT' in self.representation:
match &= (self.representation['TEXT'] == obj.text)
if 'LEMMA' in self.representation:
match &= (self.representation['LEMMA'] == obj.lemma_)
if 'DEP' in self.representation:
match &= (self.representation['DEP'] == obj.dep_)
if 'POS' in self.representation:
match &= (self.representation['POS'] == obj.pos_)
if 'TAG' in self.representation:
match &= (self.representation['TAG'] == obj.tag_)
if 'ENTITY' in self.representation:
match &= (self.representation['ENTITY'] == obj.ent_type_)
return match
def __str__(self):
return json.dumps(self.representation)
When you write your production rules, you can use the SpacyToken
class where before you'd use the default TerminalToken
:
rules.append(ProductionRule(NonTerminalToken('SOME_PRODUCTION_RULE'), [SpacyToken().dependency("pobj")], record))
Dev setup
Run make init
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyllk-0.0.7.tar.gz
.
File metadata
- Download URL: pyllk-0.0.7.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5d04963f3ba7c38c2e4341a10e54a5de14ee7fc7eb611492e6a51e181d0afab |
|
MD5 | 68752eb6e717fb3fefdba4948a207aae |
|
BLAKE2b-256 | 1a09647d3db9d056de96fd2861346ead496242dc4fe895d0474dbebf3721e3e0 |
File details
Details for the file pyllk-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: pyllk-0.0.7-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57d4fa15f51b598a0af3119db3c3d6cccb7a180d5b2610ec4caf9929a0b067c7 |
|
MD5 | e35c754b4ec132a8c471edb1cd5f8eba |
|
BLAKE2b-256 | 73d0d636523ffd60ecb08adbd61e2db67476de5894a616b1e04d50a80a04d7fc |