Skip to main content

High-performance PEG parsing (a port of TatSu to Rust)

Project description

CodSpeed

铁修 TieXiu

A high-performance port of TatSu to Rust.

TieXiu (铁修) is a PEG (Parsing Expression Grammar) engine that implements the flexibility and power of the original TatSu lineage into a memory-safe, high-concurrency architecture optimized for modern CPU caches.

Why Still Alpha?

Although TieXiu is functionally complete, extending the alpha period allows for adjusting the API and its signatures to the user experience. The plan is to later go through a beta period to flush out any remaining quirks or bugs.

About

TieXiu is a tool that takes grammars in extended EBNF_ as input, and outputs memoizing (Packrat) PEG parsers as a Rust model. The classic variations of EBNF (Tomassetti, EasyExtend, Wirth) and ISO EBNF are supported as input grammar formats.

The TatSu Documentation provides a vision of where the TieXiu project is heading. A copy of the grammar syntax can can be accessed locally in the SYNTAX document.

TieXiu is foremost a Rust library that is also published as a Python library with the help of PyO3/Maturin. The Rust API may return objects of types in the internal parser or tree model. The Python API has strings as input and json.dumps() compatible Python objects as output.

TatSu is a mature project with an important user base so it's difficult to make certain changes even if they are improvements or fixes for long-standing quirks (as well known within experienced software engineers, a long-lived quirk becomes a feature). TieXiu is an opportunity to start from scratch, with a modern approach, even if the grammar syntax and its semantics are preserved.

Non-Features

Most features of TatSu are available in TieXiu. Some features have not yet been implemented, and a few never will:

  • Generation of synthetic classes from grammar parameters will not be implemented in Rust.
  • Generation of source code with an object model for deifinitions in the grammar may be implemented if a way is found to make the parser or postprocessing bind the Tree output of a parse to the model (serde_json provides the infrastructure for trying).
  • Code generation of a parser recently moved in TatSu to the loading of a model of the Grammar and using it as parser. Although the generated procedural parser may produce 1.3x increased throughput in Python, supporting generated code is hard and it complicates the internal interfaces. For Rust, TieXiu alreay knows how to load fast a Grammar model from TatSu JSON. A generated copy of the grammar model constructor could be precompiled by Rust.
  • Parsing of boolean and numeric values happens in TatSu through synthetic actions, which call the constructors for those types passing the parsed strings. For TieXiu the preferred way of transformig a tree (semantics) is through post-processing (folding).
  • Semantic actions (transformations) during parse are not implemented. Python is friendly to objects. Python is OK with objects of type Any, so semantic actions during parse in TatSu can produce a tree of any type. Rust is different, and trying to have structures of an any type is not rustacean. The result of a parse is a well-defined Tree which is a small-enough enum that writing a walker for it is easy, so type transformations can be done in postprocessing by folding. See the fold modules in TieXiu for examples and useful trait definitions.
  • Interpolation and evaluation of `constant` expressions hasn't had any known use cases with TatSu. They will not be implemented in TieXiu until a use case appears.
  • The @@include directive for textual includes was always a bad idea.

API

The needs of most users are met by parsing input with the rules in a grammar and reciving the structure output as a JSON-compatible value. For other use cases, TieXiu exposes its internal model and APIs (to be docummented).

The Python API

The return values of Any are of the basic Python types, as defined in the json module documentation (see Encoders and Decoders ).

JSON Python
object dict
array list
string str
number (int) int
number (real) float
true True
false False
null None

Keyword arguments can be passed for runtime configuration. The only recognized argument as of writing is trace=.

These functions are available from package tiexiu.

def parse(grammar: str, text: str, **kwargs: Any) -> Any
def parse_grammar(grammar: str, **kwargs: Any) -> Any:
def parse_grammar_to_json(grammar: str, **kwargs: Any) -> Any:
def parse_to_json(grammar: str, text: str, **kwargs: Anyt) -> Any:
def pretty(grammar: str, **kwargs: Any) -> str:
def compile_to_json(grammar: str, **kwargs: Any) -> Any:

The Rust API

pub fn parse_grammar(grammar: &str, cfg: &CfgA) -> Result<Tree>;
pub fn parse_grammar_to_json(grammar: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn parse_grammar_to_json_string(grammar: &str, cfg: &CfgA) -> Result<String>;
pub fn parse_grammar_with<U>(cursor: U, cfg: &CfgA) -> Result<Tree>
pub fn parse_grammar_to_json_with<U>(cursor: U, cfg: &CfgA) -> Result<serde_json::Value>
pub fn compile(grammar: &str, cfg: &CfgA) -> Result<Grammar>;
pub fn compile_to_json(grammar: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn compile_to_json_string(grammar: &str, cfg: &CfgA) -> Result<String>;
pub fn compile_with<U>(cursor: U, cfg: &CfgA) -> Result<Grammar>
pub fn compile_to_json_with<U>(cursor: U, cfg: &CfgA) -> Result<serde_json::Value>
pub fn load(json: &str, _cfg: &CfgA) -> Result<Grammar>;
pub fn load_to_json(json: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn load_tree(json: &str, _cfg: &CfgA) -> Result<Tree>;
pub fn load_tree_to_json(json: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn grammar_pretty(grammar: &str, cfg: &CfgA) -> Result<String>;
pub fn pretty_tree(tree: &Tree, _cfg: &CfgA) -> Result<String>;
pub fn pretty_tree_json(tree: &Tree, _cfg: &CfgA) -> Result<String>;
pub fn parse(grammar: &str, text: &str, cfg: &CfgA) -> Result<Tree>;
pub fn parse_to_json(grammar: &str, text: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn parse_to_json_string(grammar: &str, text: &str, cfg: &CfgA) -> Result<String>;
pub fn parse_input(parser: &Grammar, text: &str, cfg: &CfgA) -> Result<Tree>;
pub fn parse_input_to_json(parser: &Grammar, text: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn parse_input_to_json_string(parser: &Grammar, text: &str, cfg: &CfgA) -> Result<String>;

Roadmap

The project is functionally complete. Comments about the implementation strategies and possible improvements are now in RODADMAP.

License

Licensed under either of:

at your option.

Contribution

Unless explicitly stated otherwise, any contribution intentionally submitted for inclusion in the work, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiexiu-0.1.1a11.tar.gz (597.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tiexiu-0.1.1a11-cp312-abi3-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12+Windows x86-64

tiexiu-0.1.1a11-cp312-abi3-manylinux_2_28_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.28+ x86-64

tiexiu-0.1.1a11-cp312-abi3-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

tiexiu-0.1.1a11-cp312-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file tiexiu-0.1.1a11.tar.gz.

File metadata

  • Download URL: tiexiu-0.1.1a11.tar.gz
  • Upload date:
  • Size: 597.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tiexiu-0.1.1a11.tar.gz
Algorithm Hash digest
SHA256 bd187823750f179ed95f762563c6f35920d83edf6215aa249c2b060c1f1ce8ac
MD5 10578796b65942029023217813cc8910
BLAKE2b-256 e8b4c10640f31ba05f66ce7126abb66847bc459017d3e82dc32e710fbff79378

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a11.tar.gz:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a11-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: tiexiu-0.1.1a11-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tiexiu-0.1.1a11-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e998c920658435756d931fdc7515806d33191533fd342f5ac1a32d32e0758897
MD5 73566b7255ae29ea7da50a9350b1730a
BLAKE2b-256 b6282fe6b2e0b07530bd37e067f5bbfc7cb0783c946d8ea8ea6d0ee731d9af40

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a11-cp312-abi3-win_amd64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a11-cp312-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tiexiu-0.1.1a11-cp312-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4210daeab4eb6c6ac2c6efb10e1538e4abbc1a8e397b47cac2d2343b71687c3e
MD5 6cdcd6b3669a58704178752b7421955b
BLAKE2b-256 7bd7fe07f022eed16a33375b58f3b24d176e68ab4e7555cb0493a0a20fdc2a8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a11-cp312-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a11-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tiexiu-0.1.1a11-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a67cf183a9118c4c079a16ea2de4dcd31e3b2a12273776c51e22c054175ea847
MD5 07703612297d56ceb2b2e761aa7d53a6
BLAKE2b-256 df209bce2d800ff614f3ed550efd5a0496dc1aac5c497510e276c3753677b4ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a11-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a11-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tiexiu-0.1.1a11-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 220e12e58deb63393c61ac221dadfb07e61fd194d22bc4d2b64d439c284dc2ef
MD5 b96e9fd88ebb5620a74f42e735f6e078
BLAKE2b-256 9387204014b6522eca660c9f44506a1261725f955bb665d443c911614bc50f48

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a11-cp312-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page