Skip to main content

Blazingly fast Rust rewrite of pyparsing - 50-200x faster

Project description

pyparsing-rs

Rust rewrite of Python's pyparsing parser combinator library with pyo3 bindings. Target: 50-200x performance improvement with 100% API compatibility.

Mission

Work autonomously until 50x speedup is achieved on all benchmarks. Do not stop. Do not ask for permission. Iterate relentlessly.

Environment

  • Rust: stable toolchain
  • Project: /home/aibrush/pyparsing-rs
  • Reference: /home/aibrush/pyparsing-original (original pyparsing source + tests)

Commands

# Build
maturin develop --release

# Test
python -m pytest tests/ -v

# Benchmark
python tests/test_performance.py

# Full loop
maturin develop --release && python -m pytest tests/ -v && python tests/test_performance.py

# Profile when stuck
cargo flamegraph --release

# Install Python packages
uv pip install <package>

Architecture

src/
├── lib.rs              # pyo3 module entry point
├── core/               # Core infrastructure
│   ├── parser.rs       # ParserElement trait
│   ├── context.rs      # Parse context, position tracking
│   ├── results.rs      # ParseResults (list + dict)
│   ├── exceptions.rs   # ParseException
│   └── memoization.rs  # Packrat memoization
├── elements/           # Parser elements
│   ├── literals.rs     # Literal, Keyword, CaselessLiteral
│   ├── chars.rs        # Word, Char, CharsNotIn, Regex
│   ├── combinators.rs  # And, Or, MatchFirst
│   ├── repetition.rs   # ZeroOrMore, OneOrMore, Opt
│   ├── structure.rs    # Group, Suppress, Combine
│   └── forward.rs      # Forward (recursive grammars)
└── helpers/
    └── common.rs       # pyparsing_common equivalents
tests/
├── test_api_compat.py  # Must match original pyparsing behavior
└── test_performance.py # Benchmark comparisons (goal: 50x)
test_grammars/          # Sample grammars

Implementation Priority

  1. ParserElement trait → 2. Literal, Keyword → 3. Word, Regex →
  2. And, Or, MatchFirst → 5. ZeroOrMore, OneOrMore → 6. Group, Suppress → 7. Forward

Code Rules

  • Zero-copy: Use &str slices, return indices into original string
  • Inline hot paths: #[inline] and #[inline(always)] on frequently called methods
  • Avoid dyn trait: Use enum dispatch or generics for hot paths
  • Fast hashing: Use FxHashMap from rustc-hash for memoization
  • API parity: Same class names, methods, operators as original pyparsing
  • Cargo.toml: Enable lto = true, codegen-units = 1 in release profile

Python API to Match

import pyparsing_rs as pp

# Basic elements
lit = pp.Literal("hello")
word = pp.Word(pp.alphas(), pp.alphanums())
regex = pp.Regex(r"\d+")

# Combinators (via operators)
sequence = lit + word        # And
first_match = lit | word     # MatchFirst  
longest_match = lit ^ word   # Or

# Repetition
zero_or_more = pp.ZeroOrMore(word)
one_or_more = pp.OneOrMore(word)
optional = pp.Opt(word)

# Result manipulation
grouped = pp.Group(word + word)
suppressed = pp.Suppress(lit)
combined = pp.Combine(word + word)

# Recursive (Forward reference)
expr = pp.Forward()
expr <<= word | "(" + expr + ")"

# Parse
result = grammar.parse_string("input text")
result[0]          # List access
result["name"]     # Dict access (if named)
result.as_list()   # Convert to list
result.as_dict()   # Convert to dict

Testing Strategy

  1. Copy test files: cp -r /home/aibrush/pyparsing-original/tests/* tests/
  2. Run baseline: python baseline_benchmark.py → saves baseline_results.json
  3. Compare: Rust implementation must return identical data to original
  4. Benchmark: Track speedup in performance_results.json

Success Criteria

All must be true:

  • All benchmarks show ≥50x speedup
  • 100% of basic pyparsing tests pass
  • Drop-in replacement API (same classes, methods, operators)
  • Core elements: Literal, Word, Regex, And, Or, ZeroOrMore, Group, Forward

Key Performance Optimizations

Level 1 (do first):

  • LTO + release builds
  • &str instead of String
  • Inline small functions

Level 2 (when needed):

  • Bitset for character class membership (O(1) lookup)
  • Byte operations instead of char for ASCII
  • SIMD scanning with memchr crate

Level 3 (if still slow):

  • Packrat memoization with FxHashMap
  • Arena allocation for ParseResults
  • Enum dispatch instead of dyn trait

Important Notes

  • Original pyparsing repo: https://github.com/pyparsing/pyparsing
  • Test files are in /home/aibrush/pyparsing-original/tests/
  • Original pyparsing is editable-installed; use import pyparsing for reference
  • import pyparsing_rs for your Rust implementation
  • Never sacrifice correctness for speed - tests must pass
  • Profile before optimizing - don't guess bottlenecks

pyparsing Key Concepts

Operator Overloading

a + b   # And (sequence)
a | b   # MatchFirst (first match wins)
a ^ b   # Or (longest match wins)
~a      # NotAny (negative lookahead)
a * 3   # Exactly 3 repetitions

ParseResults

Dual list/dict access:

result[0]        # First element
result["key"]    # Named element
result.key       # Attribute access
for item in result:  # Iteration

Whitespace

pyparsing auto-skips whitespace by default. Respect this behavior.

Parse Actions

User callbacks that transform results:

integer = Word(nums).set_parse_action(lambda t: int(t[0]))

NOTE: our github repo is: https://github.com/aibrushcomputer/pyparsing-rs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyparsing_rs-0.2.0.tar.gz (75.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyparsing_rs-0.2.0-cp313-cp313-win_amd64.whl (755.2 kB view details)

Uploaded CPython 3.13Windows x86-64

pyparsing_rs-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl (883.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (739.3 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pyparsing_rs-0.2.0-cp312-cp312-win_amd64.whl (755.4 kB view details)

Uploaded CPython 3.12Windows x86-64

pyparsing_rs-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl (883.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (739.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pyparsing_rs-0.2.0-cp311-cp311-win_amd64.whl (753.3 kB view details)

Uploaded CPython 3.11Windows x86-64

pyparsing_rs-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl (883.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (739.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pyparsing_rs-0.2.0-cp310-cp310-win_amd64.whl (753.5 kB view details)

Uploaded CPython 3.10Windows x86-64

pyparsing_rs-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl (883.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.2.0-cp310-cp310-macosx_11_0_arm64.whl (739.6 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

pyparsing_rs-0.2.0-cp39-cp39-win_amd64.whl (753.5 kB view details)

Uploaded CPython 3.9Windows x86-64

pyparsing_rs-0.2.0-cp39-cp39-manylinux_2_34_x86_64.whl (884.1 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.2.0-cp39-cp39-macosx_11_0_arm64.whl (739.9 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file pyparsing_rs-0.2.0.tar.gz.

File metadata

  • Download URL: pyparsing_rs-0.2.0.tar.gz
  • Upload date:
  • Size: 75.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyparsing_rs-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6eef30bf1e2922c0e03f30c6d53a47dbe178f75c8f3833916566bb6c3509b5b2
MD5 cb9d78006f39a2947c667e4d387adf7c
BLAKE2b-256 fdcf77d228b60ff65b2a7e7e37dcdbc17f20906fd49e0f48dd6a11c262ec2513

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 879471be32fcbc80d6af8d08854011ff90c1ed403997d346a9bda9420119aca8
MD5 9908babe08ae8698985f05d4d66b53ca
BLAKE2b-256 b99cec32533587fea9fde6331052671068ba4c530023a0245ee5b5e378acff37

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f85fb35e720fb76133ecf3b133a73b676d60ed0fe4bd19c17826cc09620b447a
MD5 a5c4a48f2c2047fdc6ce8483e9075303
BLAKE2b-256 aae928057a9057648407387430aef0d8c955cb280462187c9c4303cdd8ee107e

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aa72968f8784b0f5f10104344096222353b23c32fb04f1004717f8054c2fdc5a
MD5 71212f44993e487e7be6bb43a5731bce
BLAKE2b-256 73d4a37db5444dce94c8eef197ff284df5edbf270b3b10317185feb6d81df1dd

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 63ca3cf9f3379924982fdd654cffb4c3fae2b6d570234472fc4d23bfe1099022
MD5 4254a857fb712fea12cb6f6ed02bd661
BLAKE2b-256 a3c156a277797833069e0276544df6e43d42a6b03117ed8895fb6945dcd301d8

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cb9e8f2f4ce2f50b7dbd2df831f2d6dbc8cb18dbce78d7168a5ec3d0a9cd57a5
MD5 2e99e6e9a8e3286055da0d7392f36f8b
BLAKE2b-256 7e1acbd0e40626d842c234757afc52883def15709ed3f5a6755fabdc89e2b830

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 347bcafdee5bee6c72f438c516f51afddaa539432125ee68b5931ecd247e7450
MD5 80477b52d9a626e1cb284caffaf372ee
BLAKE2b-256 a3c9c2ed2b9a8b6fda6475ae30fb462508aa2ec6c170ee4134369bb3749ba196

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d4732dababc4d08e230c4bb88ed6802f10ede415c69a0683f94cb6c8eb32971a
MD5 143f25a2d3c1c73530b82889c4c1b160
BLAKE2b-256 524d6ab1f27560ea32fff9823f7b3646af0de752b71a2da3abb8737a3f4b2b74

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e3e695fb188f20d6b52212fc384b10a764a0dad56c1b8d79856e1d1f6613af3b
MD5 99f69ab463f14b7587bd6a667bf5605c
BLAKE2b-256 2cbcd2a2693e8af340e7683342e2096c910279f3dbf579689a5e103062bc78c6

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2f4edb8892e52379610d02dfe99a3e6b6fa40278e65dc8e00ed18e5286895cb2
MD5 2a679f3d7ef7768f4b00abcab0c4cdae
BLAKE2b-256 8f404b872dd335f045af79db2f2526a32adf6bfc88281593a0ebeb2bd1685ca7

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ffeff96a463a4d01f957c807b02d33e9f70d804119a99920d1c2ccbb4d48ed0e
MD5 751e5d94cd3e93d1a1d828d2669ca658
BLAKE2b-256 770b821b557dffebe8afc5affa4d44eedda95ec52a600dcdcc9412401b82c771

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 270d42bd9ad42a9d1f01754b2c78f65b79969d9d0a30b4eb961d127f961815a0
MD5 2acf97526609c47f13797d3cb2ac1dd5
BLAKE2b-256 cfefae87a5c1a3280bba6dabed7301b70bdc05edb1fd24a5014b8f9e1ee5cd76

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fd0576bc5c06c7d2fcd036d6ed53adf9d27ff63b1b0ca6dbb41c8efe5530c61e
MD5 ee4282f215c2161f9ae59e8ab26f9627
BLAKE2b-256 16463b99aff65f8b00400c55c299121ff9b6274f2e07511dabd574c911c926a6

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyparsing_rs-0.2.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 753.5 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyparsing_rs-0.2.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c008f5f9ff3ef20a5ead4702eb06c0f3774361914a7ce867756b42081575b3f3
MD5 17b77ea42245b0ceadfd94134f71eb37
BLAKE2b-256 7fa78303e4c54dd7aa532d6551ad9418e843ae2becc906d9486ae7075e5af74b

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4e7136acb99b52e019652e1acd3322f1c568beb302c119ad5e3c52e0b4ce60ce
MD5 10cc4cc5dae19481c7e5be75ed5b8043
BLAKE2b-256 95f482a40e761e97f5c603833b388a7b97696f5444c7639e8e085bd5fd4ed5e0

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.2.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.2.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9bbde9a7eaa6ce91999c89308d16fb00f14f2dbd39210115054cda4e4197025b
MD5 a8cc7d10756b161d535f7300886462ed
BLAKE2b-256 4fa49f89908fc129ef21b934c5ddd60c36c0b2d60e164b581e630dbd764a737f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page