Python library for building Parsers and Lexers Easily

Project description

parsergen

A simple library for creating parsers and lexers.

Quickstart

pip install parsergen

Defining a Lexer

Tokens have different regular expressions. They can also have modifier functions, for example the INT tokens get their values turned into an int.

from parsergen import *
class CalcLexer(Lexer):
    
    @token(r"0x[0-9a-fA-F]+", r"[0-9]+")
    def INT(self, t):
        if t.value.startswith("0x"):
            t.value = int(t.value[2:], base=16)
        else:
            t.value = int(t.value)
        return t

    ADD    =  r"\+"
    SUB    =  r"\-"
    POW    =  r"\*\*" # must be first, as is longer than 'MUL' token!
    MUL    =  r"\*"
    DIV    =  r"\/"
    SET    =  r"set"
    TO     =  r"to"
    ID     =  r"[A-Za-z_]+"
    LPAREN =  r"\("
    RPAREN =  r"\)"
    
    ignore = " \t"
    ignore_comment = r"\#.*"

Creating a Parser

Grammar Expressions

Grammar Expressions describe the syntax that can be parsed. For our basic example calculator, you will get a terminal to type math expressions

> 2 + 3 * 4
14
> (2 + 3) * 4
20
> 2 ** 2 ** 3
256

It is important that the precedence of the arithmetic operators is correct, we have to account for this when designing our grammar rules. Here is the grammar:

statement       :  assign | expr
assign          :  SET ID TO expr
expr            :  prec3
prec3           :  prec2 (ADD | SUB prec2)*
prec2           :  prec1 (MUL | DIV prec1)*
prec1           :  factor (POW prec1)?
factor          :  INT | ID
factor          :  LPAREN expr RPAREN

the rules prec3 and prec2 are left associative, whereas prec1 is right associative because it implements the pow operator We can then define our parser.

class CalcParser(Parser):

    tokens = CalcLexer.tokens
    starting_point = "statement"

    def __init__(self):
        self.names = {}

    @grammar("assign | expr")
    def statement(self, p):
        print(p[0])
    
    @grammar("SET ID TO expr")
    def assign(self, p):
        self.names[p[1]] = p[3]
    
    @grammar("prec3")
    def expr(self, p):
        return p[0]
    
    @grammar("prec2 (ADD | SUB prec2)*") # left associative
    def prec3(self, p):
        r = p[0]
        for op, num in p[1]:
            if op == "+":
                r += num
            else:
                r -= num
        return r
    
    @grammar("prec1 (MUL | DIV prec1)*") # left associative
    def prec2(self, p):
        r = p[0]
        for op, num in p[1]:
            if op == "*":
                r *= num
            else:
                r /= num
        return r
    
    @grammar("factor (POW prec1)?") # right associative
    def prec1(self, p):
        if p[1]:
            return p[0] ** p[1][1]
        return p[0]
    
    @grammar("INT")
    def factor(self, p):
        return p[0]
    
    @grammar("ID")
    def factor(self, p):
        try:
            return self.names[p[0]]
        except KeyError:
            raise Exception(f"variable '{p[0]}' is not defined.")

    @grammar("LPAREN expr RPAREN")
    def factor(self, p):
        return p[1]

# We can then create a simple runtime loop
l = CalcLexer()
p = CalcParser()

while True:
    s = input("> ")
    l_result = l.lex_string(s)
    p.parse(l.lex_string(s))

Handling Newlines

The Lexer, by default knows nothing about line numbers. You have to tell it what to do.

class MyLexer(Lexer):
    @token(r"\n+")
    def NEWLINE(self, t):
        self.lineno += len(t.value)
        self.column = 0
        return t
    ...

See example_calc.py and example.py for more examples, or look at the source code.

Project details

Release history Release notifications | RSS feed

2.0.0b10 pre-release

Oct 19, 2023

2.0.0b9 pre-release

Oct 18, 2023

2.0.0b8 pre-release

Jul 27, 2021

2.0.0b7 pre-release

Jun 12, 2021

2.0.0b6 pre-release

Jun 12, 2021

2.0.0b5 pre-release

Jun 10, 2021

2.0.0b4 pre-release

Jun 9, 2021

2.0.0b3 pre-release

Jun 7, 2021

2.0.0b2 pre-release

Jun 7, 2021

2.0.0b1 pre-release

Jun 5, 2021

1.0.3

May 30, 2021

1.0.2

May 30, 2021

1.0.1

May 30, 2021

This version

1.0.0

May 29, 2021

1.0.0b1 pre-release

May 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsergen-1.0.0.tar.gz (8.7 kB view hashes)

Uploaded May 29, 2021 Source

Built Distribution

parsergen-1.0.0-py3-none-any.whl (8.2 kB view hashes)

Uploaded May 29, 2021 Python 3

Hashes for parsergen-1.0.0.tar.gz

Hashes for parsergen-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ad0e83ac09331dede4f493ccc7e158fdc49cd4eb7e09b6b69740d2c1d9d5f2bc`
MD5	`deebe8a090957c892905df974b0ab98d`
BLAKE2b-256	`a76acce8142f4ee7d891da82dd4708952313b038f74af2df3c3b1163915c6819`

Hashes for parsergen-1.0.0-py3-none-any.whl

Hashes for parsergen-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a8dc4a572aaab14bd8002f9aad410b7b9339b8e0eff67da13c35e070c8e74c1`
MD5	`b849194742892b0a2fecc9f95dd19ec8`
BLAKE2b-256	`9dcab520574d5b21deea97ff0aa74cf3f6104ec5fe82b36c4cc949d3d530bf86`