Skip to main content

Simple LR parser generator and parser for easy integration

Project description

Parsion - A simple LR(1) parser

Parsion is intended to be a simple drop-in parser generator, lexer and parser for grammars in python.

It's main intention is to make it possible to load and create a grammar for simple expression parsers without using code generation.

Installation

parsion is available on pypi

To install:

pip install parsion

Usage

To define a language for parsing, three components are required:

  • Lexer rules – Specify how to tokenize the input by splitting it into meaningful units (tokens).
  • Grammar – Defined as Python objects, similar in concept to Backus-Naur Form (BNF).
  • Reducers/Handlers – Functions that handle reductions, with one handler per grammar rule.

A language is defined as a Python class containing these rules.

The lexer rules are implemented as a series of regular expressions that consume the input string from the beginning. The first regex that matches the start of the string determines the next token, which is then extracted from the input before the process repeats. Each lexer rule also contains a method to postprocess the input. For example convert an integer to int, or unpack a quoted string.

The parser rules is defined as a list of rules, where each rule has three parts:

  • A handler to be called when the rule is reduced
  • The symbol that will be the result of the rule
  • A sequence of symbols that is the input to the rule

The sequence of symbols is a space seperated string, for readability. If a symbol starts with _, the _ is dropped, and the value of the corresponding token will not be passed to the handler.

A rule specifying the handler None must have exactly one symbol without _, and the input value will be passed directly as result. It is useful for defining rules to set operator precedence.

Parsing begins from the grammar rule that produces the entry token.

To parse an input string, call the parse(input) method. It will apply the defined handlers and return the result of the entry node.

If an error occurs during parsing, an exception is raised.

Example

For a bigger example, see: example.py

from parsion import Parsion

class ExprLang(Parsion):
    LEXER_RULES = [
        (None,       r'(\s+)',                   lambda x: None),
        ('INT',      r'([0-9]+|0x[0-9a-fA-F]+)', lambda x: int(x, base=0)),

        ('+',        r'(\+)',                    lambda x: None),
        ('-',        r'(-)',                     lambda x: None),
        ('*',        r'(\*)',                    lambda x: None),
        ('/',        r'(\/)',                    lambda x: None),

        ('(',        r'([\(])',                  lambda x: None),
        (')',        r'([\)])',                  lambda x: None)
    ]
    GRAMMAR_RULES = [
        ('entry',       'entry',        'expr'),
        (None,          'expr',         'expr1'),
        ('expr_add',    'expr1',        'expr1 _+ expr2'),
        ('expr_sub',    'expr1',        'expr1 _- expr2'),
        (None,          'expr1',        'expr2'),
        ('expr_mult',   'expr2',        'expr2 _* expr3'),
        ('expr_div',    'expr2',        'expr2 _/ expr3'),
        (None,          'expr2',        'expr3'),
        ('expr_neg',    'expr3',        '_- expr4'),
        (None,          'expr3',        'expr4'),
        ('expr_int',    'expr4',        'INT'),
        (None,          'expr4',        '_( expr _)'),
    ]
    def expr_add(self, lhs, rhs):
        return lhs + rhs

    def expr_sub(self, lhs, rhs):
        return lhs - rhs

    def expr_mult(self, lhs, rhs):
        return lhs * rhs

    def expr_div(self, lhs, rhs):
        return lhs // rhs

    def expr_neg(self, v):
        return -v

    def expr_int(self, v):
        return v

if __name__ == '__main__':
    # Generate the parser, and parsing tables
    expr_lang = ExprLang()

    # Run the parser
    print( expr_lang.parse('12 * (3 + 7)' )) # prints "120"

    # Run the parser again. Same table generation is reused.
    print( expr_lang.parse('2 + -4 * -10' )) # prints "42"

Error recovery

To isolate error handling, an error rule can be created.

Given the grammar:

from parsion import Parsion

class ExprErrorLang(Parsion):
    LEXER_RULES = [
        (None,       r'(\s+)',                   lambda x: None),
        ('INT',      r'([0-9]+|0x[0-9a-fA-F]+)', lambda x: int(x, base=0)),

        ('+',        r'(\+)',                    lambda x: None),
        ('-',        r'(-)',                     lambda x: None),
        ('*',        r'(\*)',                    lambda x: None),
        ('/',        r'(\/)',                    lambda x: None),

        ('(',        r'([\(])',                  lambda x: None),
        (')',        r'([\)])',                  lambda x: None),
        (';',        r'(;)',                     lambda x: None)
    ]
    GRAMMAR_RULES = [
        ('entry',       'entry',        'stmts'),
        ('stmts_list',  'stmts',        'stmt _; stmts'),
        ('stmts_tail',  'stmts',        'stmt'),

        # proxy statement, to be able to isolate errors to top level
        (None,          'stmt',         'expr'),
        ('error_stmt',  'stmt',         '$ERROR'),

        (None,          'expr',         'expr1'),
        ('expr_add',    'expr1',        'expr1 _+ expr2'),
        ('expr_sub',    'expr1',        'expr1 _- expr2'),
        (None,          'expr1',        'expr2'),
        ('expr_mult',   'expr2',        'expr2 _* expr3'),
        ('expr_div',    'expr2',        'expr2 _/ expr3'),
        (None,          'expr2',        'expr3'),
        ('expr_neg',    'expr3',        '_- expr4'),
        (None,          'expr3',        'expr4'),
        ('expr_int',    'expr4',        'INT'),
        (None,          'expr4',        '_( expr _)'),
    ]

    def stmts_list(self, expr, list):
        return [expr] + list

    def stmts_tail(self, expr):
        return [expr]

    def expr_add(self, lhs, rhs):
        return lhs + rhs

    def expr_sub(self, lhs, rhs):
        return lhs - rhs

    def expr_mult(self, lhs, rhs):
        return lhs * rhs

    def expr_div(self, lhs, rhs):
        return lhs // rhs

    def expr_neg(self, v):
        return -v

    def expr_int(self, v):
        return v
        
    # NOTE: Interface to error handler will change
    def error_stmt(self, error_stack, error_tokens):
        return None

A valid input would then be

1 + 3 + 5 + 7; 12*13*14; 313-13

Resulting in a list:

[16, 2184, 300]

The language contains three separate statements. To isolate errors within a single statement, the special rule generated by $ERROR will be used.

Upon a parse error, all tokens will be accumulated up until a possible token following the closest $ERROR rule. Then the error handler will be called, to return a placeholder for the generation, in this case stmt, and parsing will continue.

If no error handler are defined, or error happens outside of defined error handlers, an exception will be raised instead, and parsing will fail.

Given the above grammar and error handler, the following input:

An input with an error in the middle stmt:

1 + 3 + 5 + 7; 12+*13*14; 313-13

Will then result in:

[16, None, 300]

None is there the result of the error handler.

Precalculated tables

For bigger languages, it may be motivated to actually precalculate the parse tables upon packaging. For that purpose, there is a class ParsionStatic which doesn't invoke the parser generation, but takes the raw parse tables as input.

The interface for that method is to be defined and documented. Open an issue if interested in that feature.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsion-0.0.5.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsion-0.0.5-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file parsion-0.0.5.tar.gz.

File metadata

  • Download URL: parsion-0.0.5.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for parsion-0.0.5.tar.gz
Algorithm Hash digest
SHA256 46e913fb2feee389ac1cfec1dcc91f3fd57da521f3f5827605c8cd7a12390655
MD5 bc277a77d6516e61b39386e33d4bd467
BLAKE2b-256 2867eb6ffc326d7762912cb6f2950259a5ae541ebb4b4b6ad253136c10700eda

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsion-0.0.5.tar.gz:

Publisher: python-publish.yml on pengi/parsion

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parsion-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: parsion-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for parsion-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3ee358541bee5bd38d70b0343f3288d395655174263b122500cde3314258d9f1
MD5 5ed361fdc9f070e8d24f969943b98bd4
BLAKE2b-256 f01c4e9cadd362070ee1347ace07db7e3c108dd72262944bd4ab36dc629acc4e

See more details on using hashes here.

Provenance

The following attestation bundles were made for parsion-0.0.5-py3-none-any.whl:

Publisher: python-publish.yml on pengi/parsion

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page