Skip to main content

High-performance PEG parsing (a port of TatSu to Rust)

Project description

CodSpeed

铁修 TieXiu

A high-performance port of TatSu to Rust.

TieXiu (铁修) is a PEG (Parsing Expression Grammar) engine that implements the flexibility and power of the original TatSu lineage into a memory-safe, high-concurrency architecture optimized for modern CPU caches.

Why Still Alpha?

Although TieXiu is functionally complete, extending the alpha period allows for adjusting the API and its signatures to the user experience. The plan is to later go through a beta period to flush out any remaining quirks or bugs.

About

TieXiu is a tool that takes grammars in extended EBNF_ as input, and outputs memoizing (Packrat) PEG parsers as a Rust model. The classic variations of EBNF (Tomassetti, EasyExtend, Wirth) and ISO EBNF are supported as input grammar formats.

The TatSu Documentation provides a vision of where the TieXiu project is heading. A copy of the grammar syntax can can be accessed locally in the SYNTAX document.

TieXiu is foremost a Rust library that is also published as a Python library with the help of PyO3/Maturin. The Rust API may return objects of types in the internal parser or tree model. The Python API has strings as input and json.dumps() compatible Python objects as output.

TatSu is a mature project with an important user base so it's difficult to make certain changes even if they are improvements or fixes for long-standing quirks (as well known within experienced software engineers, a long-lived quirk becomes a feature). TieXiu is an opportunity to start from scratch, with a modern approach, even if the grammar syntax and its semantics are preserved.

Non-Features

Most features of TatSu are available in TieXiu. Some features have not yet been implemented, and a few never will:

  • Generation of synthetic classes from grammar parameters will not be implemented in Rust.
  • Generation of source code with an object model for deifinitions in the grammar may be implemented if a way is found to make the parser or postprocessing bind the Tree output of a parse to the model (serde_json provides the infrastructure for trying).
  • Code generation of a parser recently moved in TatSu to the loading of a model of the Grammar and using it as parser. Although the generated procedural parser may produce 1.3x increased throughput in Python, supporting generated code is hard and it complicates the internal interfaces. For Rust, TieXiu alreay knows how to load fast a Grammar model from TatSu JSON. A generated copy of the grammar model constructor could be precompiled by Rust.
  • Parsing of boolean and numeric values happens in TatSu through synthetic actions, which call the constructors for those types passing the parsed strings. For TieXiu the preferred way of transformig a tree (semantics) is through post-processing (folding).
  • Semantic actions (transformations) during parse are not implemented. Python is friendly to objects. Python is OK with objects of type Any, so semantic actions during parse in TatSu can produce a tree of any type. Rust is different, and trying to have structures of an any type is not rustacean. The result of a parse is a well-defined Tree which is a small-enough enum that writing a walker for it is easy, so type transformations can be done in postprocessing by folding. See the fold modules in TieXiu for examples and useful trait definitions.
  • Interpolation and evaluation of `constant` expressions hasn't had any known use cases with TatSu. They will not be implemented in TieXiu until a use case appears.
  • The @@include directive for textual includes was always a bad idea.

API

The needs of most users are met by parsing input with the rules in a grammar and reciving the structure output as a JSON-compatible value. For other use cases, TieXiu exposes its internal model and APIs (to be docummented).

The Python API

The return values of Any are of the basic Python types, as defined in the json module documentation (see Encoders and Decoders ).

JSON Python
object dict
array list
string str
number (int) int
number (real) float
true True
false False
null None

Keyword arguments can be passed for runtime configuration. The only recognized argument as of writing is trace=.

These functions are available from package tiexiu.

def parse(grammar: str, text: str, **kwargs: Any) -> Any
def parse_grammar(grammar: str, **kwargs: Any) -> Any:
def parse_grammar_to_json(grammar: str, **kwargs: Any) -> Any:
def parse_to_json(grammar: str, text: str, **kwargs: Anyt) -> Any:
def pretty(grammar: str, **kwargs: Any) -> str:
def compile_to_json(grammar: str, **kwargs: Any) -> Any:

The Rust API

pub fn parse_grammar(grammar: &str, cfg: &CfgA) -> Result<Tree>;
pub fn parse_grammar_to_json(grammar: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn parse_grammar_to_json_string(grammar: &str, cfg: &CfgA) -> Result<String>;
pub fn parse_grammar_with<U>(cursor: U, cfg: &CfgA) -> Result<Tree>
pub fn parse_grammar_to_json_with<U>(cursor: U, cfg: &CfgA) -> Result<serde_json::Value>
pub fn compile(grammar: &str, cfg: &CfgA) -> Result<Grammar>;
pub fn compile_to_json(grammar: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn compile_to_json_string(grammar: &str, cfg: &CfgA) -> Result<String>;
pub fn compile_with<U>(cursor: U, cfg: &CfgA) -> Result<Grammar>
pub fn compile_to_json_with<U>(cursor: U, cfg: &CfgA) -> Result<serde_json::Value>
pub fn load(json: &str, _cfg: &CfgA) -> Result<Grammar>;
pub fn load_to_json(json: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn load_tree(json: &str, _cfg: &CfgA) -> Result<Tree>;
pub fn load_tree_to_json(json: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn grammar_pretty(grammar: &str, cfg: &CfgA) -> Result<String>;
pub fn pretty_tree(tree: &Tree, _cfg: &CfgA) -> Result<String>;
pub fn pretty_tree_json(tree: &Tree, _cfg: &CfgA) -> Result<String>;
pub fn parse(grammar: &str, text: &str, cfg: &CfgA) -> Result<Tree>;
pub fn parse_to_json(grammar: &str, text: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn parse_to_json_string(grammar: &str, text: &str, cfg: &CfgA) -> Result<String>;
pub fn parse_input(parser: &Grammar, text: &str, cfg: &CfgA) -> Result<Tree>;
pub fn parse_input_to_json(parser: &Grammar, text: &str, cfg: &CfgA) -> Result<serde_json::Value>;
pub fn parse_input_to_json_string(parser: &Grammar, text: &str, cfg: &CfgA) -> Result<String>;

Roadmap

The project is functionally complete. Comments about the implementation strategies and possible improvements are now in RODADMAP.

License

Licensed under either of:

at your option.

Contribution

Unless explicitly stated otherwise, any contribution intentionally submitted for inclusion in the work, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiexiu-0.1.1a12.tar.gz (597.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tiexiu-0.1.1a12-cp312-abi3-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12+Windows x86-64

tiexiu-0.1.1a12-cp312-abi3-manylinux_2_28_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12+manylinux: glibc 2.28+ x86-64

tiexiu-0.1.1a12-cp312-abi3-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12+macOS 11.0+ ARM64

tiexiu-0.1.1a12-cp312-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12+macOS 10.12+ x86-64

File details

Details for the file tiexiu-0.1.1a12.tar.gz.

File metadata

  • Download URL: tiexiu-0.1.1a12.tar.gz
  • Upload date:
  • Size: 597.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tiexiu-0.1.1a12.tar.gz
Algorithm Hash digest
SHA256 2897882cf232f04566837b9ef5cc37c93b2253690402b0b8192639898b8cb8c4
MD5 8df616c17f77db02c762420fbc1b3440
BLAKE2b-256 04b6a64ca015495164252b23a2da3c9d84a646a653028cb61fd577a50feb1fad

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a12.tar.gz:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a12-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: tiexiu-0.1.1a12-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tiexiu-0.1.1a12-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 cfec3309950c3b671151fbaefdb8c9e076c487890972c4631afc32155add66eb
MD5 3b59f0dc54626ad736b9acc3bbd73a84
BLAKE2b-256 ea5b1d127a9c5ea050f986f8412e23749fd8033d02931f18b18812548779d81a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a12-cp312-abi3-win_amd64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a12-cp312-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tiexiu-0.1.1a12-cp312-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8cef5ea9ac08fb43da95a2d154112b3f43645315aae73d42c3445c4beb4d80b6
MD5 3a55c822d7648f742d168fba2c33cee4
BLAKE2b-256 63739f46ac7a4691f2e64a8099109ffbff8f05633c82270885b1d9f307fdd53a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a12-cp312-abi3-manylinux_2_28_x86_64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a12-cp312-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tiexiu-0.1.1a12-cp312-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7e6731e8b5286df2aef797f331486e655dde8adb674f5d4d03e9a46d4d83cdf2
MD5 f8bad3b1c1f6ceec0bc838a201596238
BLAKE2b-256 74939486cd664ba34e0885bc1db26ff420e30145b9a9b202aa1d31513541bb5c

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a12-cp312-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tiexiu-0.1.1a12-cp312-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tiexiu-0.1.1a12-cp312-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0845fb0c1472f499fd31cb8737c0389d67bedd84ba25db0baa5039b3579927b0
MD5 c564e0ce6b793af0c6e4a82c2dfc3132
BLAKE2b-256 725db951d1c74b1ed1bce4a2100bc2fdb91a5c76173ea7575279a9e558cf30d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for tiexiu-0.1.1a12-cp312-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on neogeny/TieXiu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page