AST shaping for antlr parsers
This package allows you to use ANTLR grammars and use the parser output to generate an abstract syntax tree (AST).
pip install antlr-ast
Note: this package is not python2 compatible.
# may need: # pip install pytest py.test
antlr-ast involves four steps:
- Using ANTLR to define a grammar and to generate the necessary Python files to parse this grammar
parseto get the ANTLR runtime output based on the generated grammar files
process_treeon the output of the previous step
BaseAstVisitor(customisable by providing a subclass) transforms the ANTLR output to a serializable tree of
BaseNodes, dynamically created based on the rules in the ANTLR grammar
BaseNodeTransformersubclass can be used to transform each kind of node
- The simplify option can be used to shorten paths in the tree by skipping nodes that only have a single descendant
- Using the resulting tree
The next sections go into more detail about these steps.
To visualize the process of creating and transforrming these parse trees, you can use this ast-viewer.
This page explains how to write ANTLR parser rules.
The rule definition below is an example with descriptive names for important ANTLR parser grammar elements:
rule_name: rule_element? rule_element_label='literal' #RuleAlternativeLabel | TOKEN+ #RuleAlternativeLabel ;
Rule element and alternative labels are optional.
() have the same meaning as in RegEx.
Below, we'll use a simple grammar to explain how
This grammar can be found in
grammar Expr; // parser expr: left=expr op=('+'|'-') right=expr #BinaryExpr | NOT expr #NotExpr | INT #Integer | '(' expr ')' #SubExpr ; // lexer INT : [0-9]+ ; // match integers NOT : 'not' ; WS : [ \t]+ -> skip ; // toss out whitespace
ANTLR can use the grammar above to generate a parser in a number of languages. To generate a Python parser, you can use the following command.
antlr4 -Dlanguage=Python3 -visitor /tests/Expr.g4
This will generate a number of files in the
/tests/ directory, including a Lexer (
a parser (
ExprParser.py), and a visitor (
You can use and import these directly in Python. For example, from the root of this repo:
from tests import ExprVisitor
To easily use the generated files, they are put in the
__init__.py file exports the generated files under an alias that doesn't include the name of the grammar.
BaseNode subclass has fields for all rule elements and labels for all rule element labels in its corresponding grammar rule.
Both fields and labels are available as properties on
Labels take precedence over fields if the names would collide.
The name of a
BaseNode is the name of the corresponding ANTLR grammar rule, but starting with an uppercase character.
If rule alternative labels are specified for an ANTLR rule, these are used instead of the rule name.
Typically, there is no 1-to-1 mapping between ANTLR rules and the concepts of a language: the rule hierarchy is more nested. Transformations can be used to make the initial tree of BaseNodes based on ANTLR rules more similar to an AST.
BaseNodeTransformer will walk over the tree from the root node to the leaf nodes.
When visiting a node, it is possible to transform it.
The tree is updated with transformed node before continuing the walk over the tree.
To define a node transform, add a static method to the
BaseNodeTransformer subclass passed to
- The name of the method you should define follows this pattern:
<BaseNode>should be replaced by the name of the
BaseNodesubclass to transform.
- The method should return the transformed node.
This is a simple example:
class Transformer(BaseNodeTransformer): @staticmethod def visit_My_antlr_rule(node): return node.name_of_part
A custom node can represent a part of the parsed language, a type of node present in an AST.
To make it easy to return a custom node, you can define
Normally, fields of
AliasNodes are like symlinks to navigate the tree of
Instances of custom nodes are created from a
Fields and labels of the source
BaseNode are also available on the
AliasNode field name collides with these, it takes precedence when accessing that property.
This is what a custom node looks like:
class NotExpr(AliasNode): _fields_spec = ["expr", "op=NOT"]
This code defines a custom node,
NotExpr with an
expr and an
_fields_spec class property is a list that defines the fields the custom node should have.
This is how a field spec in this list is used when creating an custom node from a
BaseNode (the source node):
- If a field spec does not exist on the source node, it is set to
- If multiple field specs define the same field, the first one that isn't
- If a field spec is just a name, it is copied from the source node
- If a field spec is an assignment, the left side is the name of the field on the
AliasNodeand the right side is the path that should be taken starting in the source node to get the node that should be the value for the field on the custom node. Parts of this path are separated using
Connecting to the transformer
To use this custom node, add a method to the transformer:
class Transformer(BaseNodeTransformer): # ... # here the BaseNode name is the same as the custom node name # but that isn't required @staticmethod def visit_NotExpr(node): return NotExpr.from_spec(node)
Instead of defining methods on the transformer class to use custom nodes, it's possible to do this automatically:
To make this work, the
AliasNode classes in the list should have a
_rules class property
with a list of the
BaseNode names it should transform.
This is the result:
class NotExpr(AliasNode): _fields_spec = ["expr", "op=NOT"] _rules = ["NotExpr"] class Transformer(BaseNodeTransformer): pass alias_nodes = [NotExpr] Transformer.bind_alias_nodes(alias_nodes)
An item in
_rules can also be a tuple.
In that case, the first item in the tuple is a
and the second item is the name of a class method of the custom node.
It's not useful in the example above, but it is equivalent to this:
class NotExpr(AliasNode): _fields_spec = ["expr", "op=NOT"] _rules = [("NotExpr", "from_not")] @classmethod def from_not(cls, node): return cls.from_spec(node) class Transformer(BaseNodeTransformer): pass alias_nodes = [NotExpr] Transformer.bind_alias_nodes(alias_nodes)
Using the final tree
It's easy to use a tree that has a mix of
AliasNodes and dynamic
the whole tree is just a nested Python object.
When searching nodes in a tree, the priority of nodes can be taken into account.
BaseNodes have priority 3 and
AliasNodes have priority 2.
When writing code to work with trees, it can be affected by changes in the grammar, the transforms and the custom nodes. The grammar is the most likely to change.
To make grammar updates have no impact on your code, don't rely on
You can still check whether the
AliasNode parent node of a
BaseNode has the correct fields set
and search for nested
AliasNodes in a subtree.
If you do rely on
BaseNodes, code could break by the addition of
AliasNodes that replace some of these
if a field name collides with a field name on a used
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.