Skip to main content

A Parsing Expression Grammar (PEG) Parser Written in Pure Python

Project description

peg.py

Peg.py is a Parsing Expression Grammar (PEG) parser written in pure Python.

Install

You can install peg.py by running:

$ pip install peg.py

Example Usage

You can build a Grammar and call its parse method to get the parse tree.

from peg import Grammar

greeter = Grammar("""
    greet   <- "Hello, " someone
    someone <- [a-zA-Z0-9]
""")

input = "Hello, world"
tree = greeter.parse(input)

Peg.py provides a tree visitor NodeVisitor. You an inherit this class and define your own visit_ methods to handle your parse tree. Each visit_ method handles a rule, such as visit_greet, visit_someone.

class GreeterVisitor(NodeVisitor):

    def visit_greet(self, node, children):
    	return ('Hello', children[0])

    def visit_someone(self, node, _):
        return node.text

visitor = GreeterVisitor(greeter)
result = visitor.visit(tree)

print(result)

The program should produce such a result:

('Hello', 'world')

Syntax

A grammar consists of a set of rules.

A rule consists of a name, a left arrow, and a pattern.

name <- pattern

A name starts with alphabets or underscore, followed by alphabets or digits or underscore.

valid
Valid
_valid
0nay 	// invalid

A pattern contains one or more of the elements mentioned below.

A literal is a string enclosed in double quotes or single quotes. For example, "hello", 'hello'. Peg.py matches the input as-is.

A set is a set of characters enclosed in square brackets. Any pairs of characters having dash (-) in-between represents all characters from the first to the second (inclusive). For example, [a-z], [a-zA-Z0-9], [a-zA-Z$_].

A dot (.) is for any character, except end-of-text.

( pattern ) groups pattern with parentheses.

< pattern > captures input text associating with an unnamed rule.

pattern? means pattern is optional.

pattern+ means pattern occuring one or more times.

pattern* means pattern occuring zero or more times.

&pattern checks if pattern matches the input. If so, consume no input. Otherwise, the parse is failed.

!pattern checks if pattern does not match the input. If so, consume no input. Otherwise, the parse is failed.

Several patterns can be written one after another, like pattern1 pattern2 pattern3. The sequence matches only when each underlying pattern matches.

Several patterns joined by slash (/), like pattern1 / pattern2 / pattern3, is ordered choices of patterns. The choices matches when any one of the underlying pattern matches.

# ignored. is a comment.

PEG Grammar For PEG Grammars

Grammar         <- Spacing Definition+ EndOfFile

Definition      <- Identifier LEFTARROW Expression
Expression      <- Sequence ( SLASH Sequence )*
Sequence        <- Prefix*
Prefix          <- ( AND / NOT )? Suffix
Suffix          <- Primary ( QUERY / STAR / PLUS )?
Primary         <- Identifier !LEFTARROW
                 / OPEN Expression CLOSE
                 / Literal
                 / Class
                 / DOT
                 / Action
                 / BEGIN
                 / END

Identifier      <- < IdentStart IdentCont* > Spacing
IdentStart      <- [a-zA-Z_]
IdentCont       <- IdentStart / [0-9]
Literal         <- ['] < ( !['] Char  )* > ['] Spacing
                 / ["] < ( !["] Char  )* > ["] Spacing
Class           <- '[' < ( !']' Range )* > ']' Spacing
Range           <- Char '-' Char / Char
Char            <- '\\' [abefnrtv'"\[\]\\]
                 / '\\' [0-3][0-7][0-7]
                 / '\\' [0-7][0-7]?
                 / '\\' '-'
                 / !'\\' .
LEFTARROW       <- '<-' Spacing
SLASH           <- '/' Spacing
AND             <- '&' Spacing
NOT             <- '!' Spacing
QUERY           <- '?' Spacing
STAR            <- '*' Spacing
PLUS            <- '+' Spacing
OPEN            <- '(' Spacing
CLOSE           <- ')' Spacing
DOT             <- '.' Spacing
Spacing         <- ( Space / Comment )*
Comment         <- '#' ( !EndOfLine . )* EndOfLine
Space           <- ' ' / '\t' / EndOfLine
EndOfLine       <- '\r\n' / '\n' / '\r'
EndOfFile       <- !.
BEGIN           <- '<' Spacing
END             <- '>' Spacing

References

  • Peg.py provides identical PEG syntax as described in Bryan Ford's PEG Paper.
  • Peg.py implements a simplied VM similar to lpeg.
  • Peg.py provides a similar API (Grammar, Grammar.parse(), NodeVisitor, etc) with parsimonious.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

peg.py-0.1.1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file peg.py-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: peg.py-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.6

File hashes

Hashes for peg.py-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0e4d4e8942d7bc64d753ec8783c70ff9dbbb53375f9890993612d8a6560343ce
MD5 b71cfef95e03812aba0e2ab9cf981a44
BLAKE2b-256 0a873c8d737da88376e8dbc4db6b400ab5134b9b11b416c17cf679fed7174071

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page