Skip to main content

Blazingly fast Rust rewrite of pyparsing - 50-200x faster

Project description

pyparsing-rs

Rust rewrite of Python's pyparsing parser combinator library with pyo3 bindings. Target: 50-200x performance improvement with 100% API compatibility.

Mission

Work autonomously until 50x speedup is achieved on all benchmarks. Do not stop. Do not ask for permission. Iterate relentlessly.

Environment

  • Rust: stable toolchain
  • Project: /home/aibrush/pyparsing-rs
  • Reference: /home/aibrush/pyparsing-original (original pyparsing source + tests)

Commands

# Build
maturin develop --release

# Test
python -m pytest tests/ -v

# Benchmark
python tests/test_performance.py

# Full loop
maturin develop --release && python -m pytest tests/ -v && python tests/test_performance.py

# Profile when stuck
cargo flamegraph --release

# Install Python packages
uv pip install <package>

Architecture

src/
├── lib.rs              # pyo3 module entry point
├── core/               # Core infrastructure
│   ├── parser.rs       # ParserElement trait
│   ├── context.rs      # Parse context, position tracking
│   ├── results.rs      # ParseResults (list + dict)
│   ├── exceptions.rs   # ParseException
│   └── memoization.rs  # Packrat memoization
├── elements/           # Parser elements
│   ├── literals.rs     # Literal, Keyword, CaselessLiteral
│   ├── chars.rs        # Word, Char, CharsNotIn, Regex
│   ├── combinators.rs  # And, Or, MatchFirst
│   ├── repetition.rs   # ZeroOrMore, OneOrMore, Opt
│   ├── structure.rs    # Group, Suppress, Combine
│   └── forward.rs      # Forward (recursive grammars)
└── helpers/
    └── common.rs       # pyparsing_common equivalents
tests/
├── test_api_compat.py  # Must match original pyparsing behavior
└── test_performance.py # Benchmark comparisons (goal: 50x)
test_grammars/          # Sample grammars

Implementation Priority

  1. ParserElement trait → 2. Literal, Keyword → 3. Word, Regex →
  2. And, Or, MatchFirst → 5. ZeroOrMore, OneOrMore → 6. Group, Suppress → 7. Forward

Code Rules

  • Zero-copy: Use &str slices, return indices into original string
  • Inline hot paths: #[inline] and #[inline(always)] on frequently called methods
  • Avoid dyn trait: Use enum dispatch or generics for hot paths
  • Fast hashing: Use FxHashMap from rustc-hash for memoization
  • API parity: Same class names, methods, operators as original pyparsing
  • Cargo.toml: Enable lto = true, codegen-units = 1 in release profile

Python API to Match

import pyparsing_rs as pp

# Basic elements
lit = pp.Literal("hello")
word = pp.Word(pp.alphas(), pp.alphanums())
regex = pp.Regex(r"\d+")

# Combinators (via operators)
sequence = lit + word        # And
first_match = lit | word     # MatchFirst  
longest_match = lit ^ word   # Or

# Repetition
zero_or_more = pp.ZeroOrMore(word)
one_or_more = pp.OneOrMore(word)
optional = pp.Opt(word)

# Result manipulation
grouped = pp.Group(word + word)
suppressed = pp.Suppress(lit)
combined = pp.Combine(word + word)

# Recursive (Forward reference)
expr = pp.Forward()
expr <<= word | "(" + expr + ")"

# Parse
result = grammar.parse_string("input text")
result[0]          # List access
result["name"]     # Dict access (if named)
result.as_list()   # Convert to list
result.as_dict()   # Convert to dict

Testing Strategy

  1. Copy test files: cp -r /home/aibrush/pyparsing-original/tests/* tests/
  2. Run baseline: python baseline_benchmark.py → saves baseline_results.json
  3. Compare: Rust implementation must return identical data to original
  4. Benchmark: Track speedup in performance_results.json

Success Criteria

All must be true:

  • All benchmarks show ≥50x speedup
  • 100% of basic pyparsing tests pass
  • Drop-in replacement API (same classes, methods, operators)
  • Core elements: Literal, Word, Regex, And, Or, ZeroOrMore, Group, Forward

Key Performance Optimizations

Level 1 (do first):

  • LTO + release builds
  • &str instead of String
  • Inline small functions

Level 2 (when needed):

  • Bitset for character class membership (O(1) lookup)
  • Byte operations instead of char for ASCII
  • SIMD scanning with memchr crate

Level 3 (if still slow):

  • Packrat memoization with FxHashMap
  • Arena allocation for ParseResults
  • Enum dispatch instead of dyn trait

Important Notes

  • Original pyparsing repo: https://github.com/pyparsing/pyparsing
  • Test files are in /home/aibrush/pyparsing-original/tests/
  • Original pyparsing is editable-installed; use import pyparsing for reference
  • import pyparsing_rs for your Rust implementation
  • Never sacrifice correctness for speed - tests must pass
  • Profile before optimizing - don't guess bottlenecks

pyparsing Key Concepts

Operator Overloading

a + b   # And (sequence)
a | b   # MatchFirst (first match wins)
a ^ b   # Or (longest match wins)
~a      # NotAny (negative lookahead)
a * 3   # Exactly 3 repetitions

ParseResults

Dual list/dict access:

result[0]        # First element
result["key"]    # Named element
result.key       # Attribute access
for item in result:  # Iteration

Whitespace

pyparsing auto-skips whitespace by default. Respect this behavior.

Parse Actions

User callbacks that transform results:

integer = Word(nums).set_parse_action(lambda t: int(t[0]))

NOTE: our github repo is: https://github.com/aibrushcomputer/pyparsing-rs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyparsing_rs-0.1.0.tar.gz (64.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyparsing_rs-0.1.0-cp313-cp313-win_amd64.whl (729.7 kB view details)

Uploaded CPython 3.13Windows x86-64

pyparsing_rs-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl (859.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (711.3 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

pyparsing_rs-0.1.0-cp312-cp312-win_amd64.whl (729.8 kB view details)

Uploaded CPython 3.12Windows x86-64

pyparsing_rs-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (859.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (711.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pyparsing_rs-0.1.0-cp311-cp311-win_amd64.whl (728.9 kB view details)

Uploaded CPython 3.11Windows x86-64

pyparsing_rs-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl (858.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (711.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pyparsing_rs-0.1.0-cp310-cp310-win_amd64.whl (729.2 kB view details)

Uploaded CPython 3.10Windows x86-64

pyparsing_rs-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl (858.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (711.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

pyparsing_rs-0.1.0-cp39-cp39-win_amd64.whl (729.3 kB view details)

Uploaded CPython 3.9Windows x86-64

pyparsing_rs-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl (859.2 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

pyparsing_rs-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (711.7 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file pyparsing_rs-0.1.0.tar.gz.

File metadata

  • Download URL: pyparsing_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 64.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyparsing_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5abf9fe72e153d31165c1ba84f16bb6a9a382f2392706b5108e3e8c27cddeb05
MD5 05c2155fbbcf4c512f70f548a8147222
BLAKE2b-256 5a75438d9f6416fa0d9884641d0fe54e47cf57e476fc570fce870112bba42443

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f712c8e5078d68fedf63f040bfc855fee98832e3e3586e61231bc86295261aa9
MD5 86803cf0d3054f4da2b3dc803517eceb
BLAKE2b-256 fef6fa058be1b07b69a0228077387d639899e14bb8b31b2a96586620c362a2ca

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a1d3acfb1e715d3cc9388a1e1a4adeeaededad29328af25022982959e49d4814
MD5 96f6c5d997e14c0b765e7f52546bb9ed
BLAKE2b-256 1c6b5ead2b0ae0fc06930f781a8ae06417f0084e9306537814ac06d36cefadbd

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 897f1b9441b6e240ef40b5676b3909cb03a2688f6dd26f0c14ecdd4727454b9e
MD5 bf5b2ae9a1635c21c68f5a06bfc1c139
BLAKE2b-256 754afc4f39d2bcd2d0dff7d0a34732dfafebf24940646b1b0471d05ea4763865

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 aa53ce61a5dbafa0ff36b778cac462735e8acdfddd6e5af826bb3fdb9a50d56f
MD5 fa85fc6d7c723058ec3be5e5d449418c
BLAKE2b-256 d060bb003f3b8c78e69051fd1480dcdbb22def586bc8b19324b732052e06cb4f

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 94299d989eb5a074eb023e7203b04f56fd091f3cbd022e5428503a364aba035f
MD5 4302f76f577b12a18b2cb6613539435e
BLAKE2b-256 aaf32bb251bc7dc87efdbeb8b132d5da96b19c3bbb775fb68f0cc72356703a04

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 caae37c43073b332241fe861b11defdc2fa08b40f64bb781a0151ab1b6718263
MD5 036903e5da7e9996487a7875097fcede
BLAKE2b-256 df109048613afba9aa21c31c82cbcace6a6396567c6aba61c57a646e0bfbf7a8

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 6297bf839b47a0be5cc9b932f41a7ef6d3b8cd60b6f21331390b5bf807384e69
MD5 07806defd7daf10030f2ed95263c8efa
BLAKE2b-256 21b2d9ec34ebae99a6ce8f08c680ca46ee9e24c2f4b1c5b3625d5412ce364cc7

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 91d6de8950747c9ef335277f25862ef19bc2e8b8fb6034f4749caeaa58dae823
MD5 0039388eec596cedac4c0c3898547ceb
BLAKE2b-256 6dd166c5aaf0afe31581deeb1f2fe719a935d706960d985089e3d598ddba2478

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7003bfdafebb05cf0a5b42e76e1a36de4fa0d047404c67e1b85abafe1fd05430
MD5 ec4514756aa7e1bcb13c0a2eb96c4a72
BLAKE2b-256 5bf641f2ec3298870b2cee3ea587321582780bea33a67b5d513ba367b46ab007

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b20b5036eb93906ac3e54e6b724be3d5c49c05ff2c9c9fe706590c7a3cbbe93e
MD5 0480f8f63b3f40dfa0385ea7fe3dad36
BLAKE2b-256 5a73dad57a71f5617fab940c922a6e27f5143e55d4786f06940cb8e45b450799

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 07c097383dbecb0f41a2aaedc3d62b1edfcc7a5d646e32953a4578bffeb9ee2e
MD5 360b5da50da7edd3799e9996e5cfe1ba
BLAKE2b-256 76aca6b95fbf9fb4ed390cf1515375002c48f31b8af1f38838fe5c199ccbd809

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f4b931de0695104f0ef1e9961f3218d683caf42b11bdcd775bce4f1ed09fc2a0
MD5 e3f90e524935ef8ad4106cbed5b9ec0d
BLAKE2b-256 38e798a71c2b2e3c76a0339a6a779f76c7408af113329d318335acadc2e597d6

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyparsing_rs-0.1.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 729.3 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyparsing_rs-0.1.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 64d890409cab7e2839400fc56cbdc7d387ff876c43041655f690b650c193f02d
MD5 2aa0d5c989821b160da29a21989a4ea9
BLAKE2b-256 e546b770fdbe06fcf6bdc3036e3290687d0ecb0afff57b4320f8dc7e4e76f426

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0fe3d9375e1cb617a4df408b3dddf8630cf66d7a4056781565eff09aa469440b
MD5 50ab2c130bcb1928fff920c60ca773c1
BLAKE2b-256 fa4c720d59086110a51da13c6cc1f2c2ead999700bb5c82709f13e9e9b89462c

See more details on using hashes here.

File details

Details for the file pyparsing_rs-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyparsing_rs-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4bb47beadfd80dae43bae0eb14ed22d512f9d8c07d27c2e32c23ff3fc14cef42
MD5 97b2fb69ff03f57f3040c341557b1d30
BLAKE2b-256 4c8cdec710dcf158b242fb4d6f0bff32486c393e062ed8812f7c2d893141dc7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page