Another PEG Parsing Tool
Project description
More Parsing!
A fork of pyparsing for faster parsing
Installation
This is a pypi package
pip install mo-parsing
Usage
This module allows you to define a PEG parser using predefined patterns and Python operators. Here is an example
>>> from mo_parsing import Word
>>> from mo_parsing.utils import alphas
>>>
>>> greet = Word(alphas)("greeting") + "," + Word(alphas)("person") + "!"
>>> result = greet.parse_string("Hello, World!")
The result can be accessed as a nested list
>>> list(result)
['Hello', ',', 'World', '!']
The result can also be accessed as a dictionary
>>> dict(result)
{'greeting': 'Hello', 'person': 'World'}
Read the pyparsing documentation for more
The Whitespace Context
The mo_parsing.whitespaces.CURRENT is used during parser creation: It is effectively defines what "whitespace" to skip during parsing, with additional features to simplify the language definition. You declare "standard" Whitespace like so:
with Whitespace() as whitespace:
# PUT YOUR LANGUAGE DEFINITION HERE (space, tab and CR are "whitespace")
If you are declaring a large language, and you want to minimize indentation, and you are careful, you may also use this pattern:
whitespace = Whitespace().use()
# PUT YOUR LANGUAGE DEFINITION HERE
whitespace.release()
The whitespace can be used to set global parsing parameters, like
set_whitespace()- set the ignored characters (default:"\t\n ")add_ignore()- include whole patterns that are ignored (like comments)set_literal()- Set the definition for whatLiteral()meansset_keyword_chars()- For defaultKeyword()(important for defining word boundary)
Navigating ParseResults
The results of parsing are in ParseResults and are in the form of an n-ary tree; with the children found in ParseResults.tokens. Each ParseResult.type points to the ParserElement that made it. In general, if you want to get fancy with post processing (or in a parse_action), you will be required to navigate the raw tokens to generate a final result
There are some convenience methods;
__iter__()- allows you to iterate through parse results in depth first search. Empty results are skipped, andGrouped results are treated as atoms (which can be further iterated if required)nameis a convenient property forParseResults.type.token_name__getitem__()- allows you to jump into the parse tree to the givenname. This is blocked by any names found insideGrouped results (because groups are considered atoms).
Parse Actions
Parse actions are methods that run after a ParserElement found a match.
- Parameters must be accepted in
(tokens, index, string)order (the opposite of pyparsing) - Parse actions are wrapped to ensure the output is a legitimate ParseResult
- If your parse action returns
Nonethen the result is the originaltokens - If your parse action returns an object, or list, or tuple, then it will be packaged in a
ParseResultwith same type astokens. - If your parse action returns a
ParseResultthen it is accepted even if is belongs to some other pattern
- If your parse action returns
Simple example:
integer = Word("0123456789").add_parse_action(lambda t, i, s: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
For slightly shorter specification, you may use the / operator and only parameters you need:
integer = Word("0123456789") / (lambda t: int(t[0]))
result = integer.parse_string("42")
assert (result[0] == 42)
Debugging
The PEG-style of mo-parsing (from pyparsing) makes a very expressible and readable specification, but debugging a parser is still hard. To look deeper into what the parser is doing use the Debugger:
with Debugger():
expr.parse_string("my new language")
The debugger will print out details of what's happening
- Each attempt, and if it matched or failed
- A small number of bytes to show you the current position
- location, line and column for more info about the current position
- whitespace indicating stack depth
- print out of the ParserElement performing the attempt
This should help to isolate the exact position your grammar is failing.
Regular Expressions
mo-parsing can parse and generate regular expressions. ParserElement has a __regex__() function that returns the regular expression for the given grammar; which works up to a limit, and is used internally to accelerate parsing. The Regex class parses regular expressions into a grammar; it is used to optimize parsing, and you may find it useful to decompose regular expressions that look like line noise.
Differences from PyParsing
This fork was originally created to support faster parsing for mo-sql-parsing. Since then it has deviated sufficiently to be it's own collection of parser specification functions. Here are the differences:
- Added
Whitespace, which controls parsing context and whitespace. It replaces the whitespace modifying methods of pyparsing - the wildcard ("
*") could be used in pyparsing to indicate multi-values are expected; this is not allowed inmo-parsing: all values are multi-values - ParserElements are static: For example,
expr.add_parse_action(action)creates a new ParserElement, so must be assigned to variable or it is lost. This is the biggest source of bugs when converting from pyparsing - removed all backward-compatibility settings
- no support for binary serialization (no pickle)
Faster Parsing
- faster infix operator parsing (main reason for this fork)
- ParseResults point to ParserElement for reduced size
- regex used to reduce the number of failed parse attempts
- packrat parser is not need
- less stack used
Contributing
If you plan to extend or enhance this code, please see the README in the tests directory
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mo_parsing-8.694.25301.tar.gz.
File metadata
- Download URL: mo_parsing-8.694.25301.tar.gz
- Upload date:
- Size: 59.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
552d904d2524408ef74609fb2368392664230c8ffdd0ff4ece414d84c0653051
|
|
| MD5 |
7da32976ddf0aa957c2eb4bbe1324ed1
|
|
| BLAKE2b-256 |
c7b1fe75894e24d8f59017706699242c624f9c748804178addcfa41b04a99187
|
File details
Details for the file mo_parsing-8.694.25301-py3-none-any.whl.
File metadata
- Download URL: mo_parsing-8.694.25301-py3-none-any.whl
- Upload date:
- Size: 62.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08223b71a2a938c775eba2f9c5b51bb4957dd2e36dd8956f44ff495d06d9ea86
|
|
| MD5 |
d6aef1ab77a0cc509739cb4a71c0fb26
|
|
| BLAKE2b-256 |
092799f848137a9381fbf963fff409d3bd89814693381c28b42968916baa86de
|