Skip to main content

Another PEG Parsing Tool

Project description

More Parsing!

An experimental fork of pyparsing

Branch Status
master Build Status
dev Build Status

Summary of Differences

This has been forked to experiment with faster parsing in the moz-sql-parser.

  • Added Whitespace, which controls parsing context and whitespace. It replaces the whitespace modifying methods of pyparsing
  • the wildcard ("*") could be used to indicate multi-values are expected; this is not allowed: all values are multi-values
  • all actions are in f(token, index, string) form, which is opposite of pyparsing's f(string, index token) form
  • ParserElements are static: For example, expr.addParseAction(action) creates a new ParserElement, so must be assigned to variable or it is lost. This is the biggest source of bugs when converting from pyparsing
  • removed all backward-compatibility settings
  • no support for binary serialization (no pickle)

Faster Parsing

  • faster infix operator parsing (main reason for this fork)
  • ParseResults point to ParserElement for reduced size
  • regex used to reduce the number of failed parse attempts
  • packrat parser is not need
  • less stack used

Details

The Whitespace Skipper

The mo_parsing.engines.CURRENT is used during parser creation: It is effectively defines "whitespace" for skipping, with additional features to simplify the language definition. You declare "standard" Whitespace like so:

with Whitespace() as whitespace:
    # PUT YOUR LANGUAGE DEFINITION HERE (space, tab and CR are "wthiespace")

If you are declaring a large language, and you want to minimize indentation, and you are careful, you may also use this pattern:

whitespace = Whitespace().use()
# PUT YOUR LANGUAGE DEFINITION HERE
whitespace.release()

The whitespace can be used to set global parsing parameters, like

  • set_whitespace() - set the ignored characters (default: "\t\n ")
  • add_ignore() - include whole patterns that are ignored (like comments)
  • set_literal() - Set the definition for what Literal() means
  • set_keyword_chars() - For default Keyword() (important for defining word boundary)

Navigating ParseResults

The results off parsing are in ParseResults and are in the form of an n-ary tree; with the children found in ParseResults.tokens. Each ParseResult.type points to the ParserElement that made it. In general, if you want to get fancy with post processing (or in a parseAction), you will be required to navigate the raw tokens to generate a final result

There are some convenience methods;

  • __iter__() - allows you to iterate through parse results in depth first search. Empty results are skipped, and Grouped results are treated as atoms (which can be further iterated if required)
  • name is a convenient property for ParseResults.type.token_name
  • __getitem__() - allows you to jump into the parse tree to the given name. This is blocked by any names found inside Grouped results (because groups are considered atoms).

addParseAction

Parse actions are methods that are run after a ParserElement found a match.

  • Parameters must be accepted in (tokens, index, string) order (the opposite of pyparsing)
  • Parse actions are wrapped to ensure the output is a legitimate ParseResult
    • If your parse action returns None then the result is the original tokens
    • If your parse action returns an object, or list, or tuple, then it will be packaged in a ParseResult with same type as tokens.
    • If your parse action returns a ParseResult then it is accepted even if is belongs to some other pattern

Debugging

The PEG-style of mo-parsing (from pyparsing) makes a very expressible and readable specification, but debugging a parser is still hard. To look deeper into what the parser is doing use the Debugger:

with Debugger():
    expr.parseString("my new language")

The debugger will print out details of what's happening

  • Each attempt, and if it matched or failed
  • A small number of bytes to show you the current position
  • location, line and character for more info about the current position
  • whitespace indicating stack depth
  • print out of the ParserElement performing the attempt

This should help to to isolate the exact position your grammar is failing.

Contributing

If you plan to extend or enhance this code, please see the README in the tests directory

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mo-parsing-5.39.21239.tar.gz (56.7 kB view details)

Uploaded Source

File details

Details for the file mo-parsing-5.39.21239.tar.gz.

File metadata

  • Download URL: mo-parsing-5.39.21239.tar.gz
  • Upload date:
  • Size: 56.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for mo-parsing-5.39.21239.tar.gz
Algorithm Hash digest
SHA256 e125743e3f98e4087456e90d53af7fd722070d7e509ea252e7e2d1c7fef2d0cf
MD5 5c214f26b675f47e67167ea9ff583d50
BLAKE2b-256 defe3423f60ceb93755cf4a2d7bf81b089c6244ca57d92428366bfa405f202c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page