Tools to make a compiler in python

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
Operating System
Programming Language
- Python :: 3

Project description

compyler

Tools to make a compiler in python

Requirements

Python 3

Lexer

Registering tokens

A Lexer object can be used to register tokens.

>>> from compyler import Lexer
>>> lexer = Lexer()
>>> lexer.add_token(name='INT', regex=r'0|[1-9][0-9]*')

The tokens are registered in order of importance.

>>> from compyler import Lexer
>>> lexer = Lexer()
>>> lexer.add_token('ID', r'[a-zA-Z_$][a-zA-Z0-9_$]*')
>>> lexer.add_token('STRING', r'\"(.|[ \t])*\"')

In the exemple above, two tokens are registered. Despite ID being a subset of STRING the lexer's greedy search makes so that the biggest match is found first, thus, STRING: "spam" is caught before than ID: eggs.

Scanning a string

To scan a string, the Lexer.tokenize method can be invoked.

>>> from compyler import Lexer
>>> lexer = Lexer()
>>> lexer.add_token(name='INT', regex=r'0|[1-9][0-9]*')
>>> lexer.add_token('ID', r'[a-zA-Z_$][a-zA-Z0-9_$]*')
>>> lexer.add_token('STRING', r'\"(.|[ \t])*\"')
>>> lexer.tokenize('123 "spam" eggs')
[INT: 123, STRING: "spam", ID: eggs]

Filtering

A tokenized string can also be filtered to remove unwanted tokens:

>>> from compyler import Lexer
>>> lexer = Lexer()
>>> lexer.add_token(name='INT', regex=r'0|[1-9][0-9]*')
>>> lexer.add_token('ID', r'[a-zA-Z_$][a-zA-Z0-9_$]*')
>>> lexer.add_token('STRING', r'\"(.|[ \t])*\"')
>>> lexer.add_token("COMMENT", r"#[^\n]*\n*$")
>>> buffer = lexer.tokenize('123 "spam" eggs')
>>> lexer.filter({"COMMENT"}, buffer)

Shift Reduce Parser

Registering productions

Productions can be created and registered using the LALRParser class:

>>> from compyler import LALRParser
>>> lalr_parser = LALRParser()
>>> lalr_parser.add_production(
...     "ProductionName",
...     {
...         ("Token1", "EOF"): (0,),
...         ("Token1", "Token2", "EOF"): (0,1)
...     }
... )

A production must also include the indices of which tokens or other productions will be used as children on the AST.

This means that if the production is:

Vardecl: ID EQ INT PLUS INT SEMICOLON

And the indices are (2,4)

The result in the AST would be:

Vardecl
|   INT
|   INT

On the example above:

>>> from compyler import LALRParser
>>> lalr_parser = LALRParser()
>>> lalr_parser.add_production(
...     "ProductionName",
...     {
...         ("Token1", "EOF"): (0,),
...         ("Token1", "Token2", "EOF"): (0,1)
...     }
... )
>>> lalr_parser[0]
ProductionName:  Token1 EOF -> $0
                | Token1 Token2 EOF -> $0 $1

Parsing a tokenized string

After registering the productions on the parser a tokenized string can be parsed:

>>> from compyler import Lexer, LALRParser
>>> lexer = Lexer()
>>> lexer.add_token("ID", r"[a-zA-Z_$][a-zA-Z0-9_$]*")
>>> lexer.add_token("ASSIGN", r"[ \t]*=[ \t]")
>>> lexer.add_token("INT", r"0|[1-9][0-9]*")
>>> lexer.add_token("SEMICOLON", r"[ \t]*;")
>>> lalr_parser = LALRParser()
>>> lalr_parser.add_production(
... "VarDecl", {
...     ("ID", "ASSIGN", "INT", "SEMICOLON"): (0,2)
... }
... )
>>> buffer = lexer.tokenize("var = 1;")
>>> lalr_parser.parse(buffer)
VarDecl

The result of the parsing process will either be a ASTNone on success object or None in case the parsing fails.

Accessing a AST node's children

After the parsing is complete and a ASTNode object is generated one can access it's children by indexing the object.

On the example above:

>>> parsed_ast = lalr_parser.parse(buffer)
>>> parsed_ast[0]
ID: var

Getting the AST's representation

The parsed AST can also be shown using a basic text representation.

This is returned by calling the representation() method.

On the example above:

>>> parsed_ast.representation()
VarDecl
| ID: var
| INT: 1

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
Operating System
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.3

Dec 19, 2023

0.2

Dec 18, 2023

This version

0.1

Dec 18, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compyler-tools-0.1.tar.gz (5.6 kB view hashes)

Uploaded Dec 18, 2023 Source

Built Distribution

compyler_tools-0.1-py3-none-any.whl (7.2 kB view hashes)

Uploaded Dec 18, 2023 Python 3

Hashes for compyler-tools-0.1.tar.gz

Hashes for compyler-tools-0.1.tar.gz
Algorithm	Hash digest
SHA256	`b66b018496f706d1a729c03b5dc0af42d211b3c6dbbcd18ceae40454ea0d6a08`
MD5	`36c11a3f2469156ab09032371ed20bb3`
BLAKE2b-256	`e04627bfbf036e80a6a7ce4190bb6959140124219eb4fa2c97ba76628d0865d4`

Hashes for compyler_tools-0.1-py3-none-any.whl

Hashes for compyler_tools-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70bbeb52668edfe9910b2486565e8d8b6ed844d03bdb32d7c1ccd79e826a612c`
MD5	`1406517a765daa817793b4e7c03b9e89`
BLAKE2b-256	`6dfa40ac6f02c202bcd9d1788baba26dfc6eac1c8faf46b205afa970937dffe8`