Skip to main content

A self-hosting parser generator with a modal lexer and LR(1) parser

Project description

Boho

PyPI version Python

A self-hosting parser generator for Python. Define your grammar in a concise EBNF-based metalanguage and Boho will generate a modal lexer (DFA-based) and an LR(1) parser that produce a clean syntax tree.

Installation

pip install boho

Quick start

from boho import Boho
from boho.interpreter import Interpreter

grammar = '''
start: sum

sum: sum "+" prod
   | prod

prod: prod "*" INT
    | INT

INT: @INT

%ignore " "
'''

b = Boho(grammar)
tree = b("2 + 3 * 4")
print(tree.pretty())

Output:

start:
  sum:
    sum:
      prod:
        'INT' '2'
    prod:
      prod:
        'INT' '3'
      'INT' '4'

Writing an interpreter

Subclass Interpreter and define methods matching your nonterminal names:

class Calc(Interpreter):
    def start(self, tree):
        return self(tree[0])

    def sum(self, tree):
        return sum(self(c) for c in tree)

    def prod(self, tree):
        result = int(self(tree[0]))
        for i in range(1, len(tree)):
            result *= int(self(tree[i]))
        return result


calc = Calc()
print(calc(tree))  # 14

The Boho metalanguage

Terminal definitions

Terminals are named in UPPER_CASE and can be described three ways:

PLUS: "+"               // string literal
NUMBER: /\d+(\.\d+)?/   // regular expression
STRING: @STR             // built-in description (@INT, @FLOAT, @STR)

Terminal descriptions can also be used directly (unnamed) in grammar rules -- they will be pruned from the syntax tree.

Prefixing a terminal name with _ (e.g. _WHITESPACE) prunes it from the tree despite being named.

Grammar rules

Nonterminals use lower_case names. Alternatives are separated with |:

value: NAME | NUMBER
assignment: NAME "=" value

EBNF extensions:

items: item+              // one or more
list: (item ",")*  item   // zero or more (with grouping)
optional: modifier?       // optional

Inline aliases with ->:

expr: term "+" term -> addition
    | term "-" term -> subtraction

Fake terminals

A name like COMMENT_ (uppercase ending with _) defines a fake terminal -- described like a nonterminal but collapsed into a single token. Useful for structures that regular expressions cannot describe (e.g. nested block comments).

Lexer modes

Following ANTLR's approach, a modal lexer is supported. Terminals before the first #mode belong to all modes.

LBRACE: "{" -> +inner    // push mode
RBRACE: "}" -> -         // pop mode

#inner
CONTENT: /[^{}]+/
Syntax Effect
-> +mode push mode onto stack
-> - pop one mode
-> -N pop N modes
-> -- clear the stack
-> mode replace top of stack

Ignoring tokens

%ignore " "
%ignore /\/\/[^\n]*/    // ignore line comments

Project structure

boho/
  __init__.py            # exports the Boho class
  boho.py                # main orchestrator
  lexer.py               # modal finite-automaton lexer
  lexer_generator.py     # terminal descriptions -> lexer DFA
  parser.py              # LR(1) shift-reduce parser
  parser_generator.py    # grammar -> LR(1) parse tables
  grammar_interpreter.py # interprets the Boho metalanguage
  interpreter.py         # base Interpreter class (visitor pattern)
  objects.py             # Token, Tree, LR1Item dataclasses
  regex.py               # regex-to-DFA via greenery
  grammars.py            # pre-compiled Boho grammar tables
docs/                    # English documentation
slo-dokumentacija/       # Slovenian documentation
examples/                # usage examples
tests/                   # test suite

How it works

  1. Your grammar string is parsed by Boho's own (bootstrapped) parser.
  2. Terminal descriptions are compiled into merged DFAs for a modal lexer.
  3. Grammar rules are compiled into LR(1) parse tables.
  4. At runtime, input text is tokenized by the lexer and then parsed into a Tree of Token leaves.

Boho is self-hosting -- its own metalanguage is specified in Boho (see examples/boho_in_boho.py).

API

Boho(grammar, log=False)

Create a parser from a grammar string. Set log=True to print the generated lexer and parser tables.

boho(text, log=False) -> Tree

Parse input text. Returns a Tree with Token leaves. Set log=True for step-by-step tracing.

Tree

  • tree.name -- nonterminal name
  • tree.children -- list of Tree / Token children
  • tree.value -- concatenated text of all descendant tokens
  • tree.pretty() -- indented string representation
  • Supports iteration and indexing (tree[0], for child in tree)

Token

  • token.name -- terminal name
  • token.value -- matched text
  • token.line, token.col -- source location

Interpreter

Base class for tree walkers. Subclass it and define methods named after your nonterminals. The default behavior for unhandled nodes: tokens return their value, trees return a list of children's results.

Dependencies

  • greenery -- regex-to-FSM conversion
  • Python 3.10+ (uses match statements and X | Y type unions)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boho-0.1.2.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boho-0.1.2-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file boho-0.1.2.tar.gz.

File metadata

  • Download URL: boho-0.1.2.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for boho-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d4e384d125a96ea84baf4f22133552743e69da3d2aebc6cd94d00495071017f8
MD5 d70a9ec9a1b1e282668e3e27ac9be5f9
BLAKE2b-256 f6225b412476deaa677f5ea2e61a17a367da6ba1394f6bfe62ff625ef13d72a7

See more details on using hashes here.

File details

Details for the file boho-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: boho-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for boho-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c4d352e759d61e8c245031ff2085c519c6ab3b6647a1d9265bffaf4400848494
MD5 0b62bbd35c6261a83df33ff9dd29b1e1
BLAKE2b-256 54e3638bb47a0a48790cea41bb3d13d2dbbfe15902be90c0dfb36e098937c764

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page