AST shaping for antlr parsers
Project description
antlr-ast
This package allows you to take an Antlr4 parser output, and generate an abstract syntax tree (AST).
Antlr4 parser outputs are often processed using the visitor pattern.
By using antlr-ast
's AstNode
, you can define how an antlr visitor should generate an AST.
Install
pip install antlr-ast
Note: this package is not python2 compatible.
Running Tests
# may need:
# pip install pytest
py.test
Usage
Note: This tutorial assumes you know how to parse code, and create a python visitor. See the Antlr4 getting started guide if you have never installed Antlr. The Antlr Mega Tutorial has useful python examples.
Using antlr-ast
involves three steps:
- Using Antlr to generate the necessary python files for a grammar.
- Shaping parsed output to an AST by subclassing
antlr-ast
's AstNode. - Connecting everything to a Visitor class.
Example Grammar
Below, we'll use a simple grammar to explain how antlr-ast
works.
This grammar can be found in /tests/Expr.g4
.
grammar Expr;
expr: left=expr op=('+'|'-') right=expr #BinaryExpr
| NOT expr #NotExpr
| INT #Integer
| '(' expr ')' #SubExpr
;
INT : [0-9]+ ; // match integers
NOT : 'not' ;
WS : [ \t]+ -> skip ; // toss out whitespace
Antlr4 can use the grammar above to generate a parser in a number of languages. To generate a python parser, you can use the following command.
antlr4 -Dlanguage=Python3 -visitor /tests/Expr.g4
This will generate a number of files in the /tests/
directory, including a Lexer (ExprLexer.py
),
a parser (ExprParser.py
), and a visitor (ExprVisitor.py
).
You can use and import these directly in python. For example, from the root of this repo...
from tests import ExprVisitor
Creating AST Classes
By subclassing AstNode, we can define how Antlr4 should shape parser output to our desired AST.
For example, /tests/test_expr_ast.py
defines the following classes for the grammar above.
from antlr_ast import AstNode
class SubExpr(AstNode):
_fields = ['expr->expression']
class BinaryExpr(AstNode):
_fields = ['left', 'right', 'op']
class NotExpr(AstNode):
_fields = ['NOT->op', 'expr']
Note that _fields
allows you to redefine rule names.
For example, in the grammar, the rule for matching expressions is named "expr".
In this case, using 'expr->expression'
says that the expr field should be called "expression" instead.
Connecting to Visitor
Once you have defined your AST classes, you can connect them to the visitor, using their _from_fields
method.
For example..
AstVisitor(ExprVisitor): # antlr visitor subclass
def visitBinaryExpr(self, ctx): # method on visitor
return BinaryExpr._from_fields(self, ctx) # _from_fields method
Then, whenever the visitor passes through a BinaryExpr node in the parse tree, it will convert it to the BinaryExpr AST Node.
Note that the parse tree has a node named BinaryExpr
, with the fields left
, right
, and expr
, due to this line of the grammar:
expr: left=expr op=('+'|'-') right=expr #BinaryExpr
For a full example, see the parse function of /tests/test_expr_ast.py
.
Unshaped nodes
When parts of the parse tree do not have instructions for mapping to an AST node, they are returned as an Unshaped
node.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file antlr-ast-0.4.0.tar.gz
.
File metadata
- Download URL: antlr-ast-0.4.0.tar.gz
- Upload date:
- Size: 17.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1cd16aaff3e721e78c451db3a74311ee2a021164c6b384008672efe6a49d8b4 |
|
MD5 | 80adc686c533e1bebbf21e641a94e7c6 |
|
BLAKE2b-256 | 5d818e20e2e6c218a952f03a165a0541ec01f6d4bea2370d59f9270c141a323f |