Skip to main content

Parse strings using a specification based on the Python format() syntax (Rust implementation)

Project description

formatparse

PyPI version Python 3.8+ License: MIT Rust Documentation

A high-performance, Rust-backed implementation of the parse library for Python. formatparse provides the same API as the original parse library but with significant performance improvements (up to 80x faster) thanks to Rust's zero-cost abstractions and optimized regex engine.

📖 Documentation

Full documentation is available at https://formatparse.readthedocs.io/

The documentation includes:

  • Getting Started Guide - Quick introduction and basic usage
  • User Guides - Comprehensive guides on patterns, datetime parsing, custom types, and bidirectional patterns
  • API Reference - Complete API documentation for all functions and classes
  • Examples & Cookbook - Practical examples and common use cases
  • Changelog - Release history in CHANGELOG.md in the repository

Features

  • 🚀 Blazing Fast: Up to 80x faster than the original Python implementation
  • 🔄 Drop-in Replacement: Compatible API with the original parse library
  • 🎯 Type-Safe: Rust backend ensures reliability and correctness
  • 🔍 Advanced Pattern Matching: Support for named fields, positional fields, and custom types
  • 📅 DateTime Parsing: Built-in support for various datetime formats (ISO 8601, RFC 2822, HTTP dates, etc.)
  • 🎨 Flexible: Case-sensitive and case-insensitive matching options
  • 💾 Optimized: Pattern caching, lazy evaluation, and batch operations for maximum performance

Installation

From PyPI

pip install formatparse

From Source

# Clone the repository
git clone https://github.com/eddiethedean/formatparse.git
cd formatparse

# Install maturin (build tool)
pip install "maturin>=1.13.3,<2.0"

# Build and install in development mode
maturin develop --manifest-path formatparse-pyo3/Cargo.toml --release

Quick Start

from formatparse import parse, search, findall

# Basic parsing with named fields
result = parse("{name}: {age:d}", "Alice: 30")
print(result.named['name'])  # 'Alice'
print(result.named['age'])   # 30

# Search for patterns in text
result = search("age: {age:d}", "Name: Alice, age: 30, City: NYC")
if result:
    print(result.named['age'])  # 30

# Find all matches
results = findall("ID:{id:d}", "ID:1 ID:2 ID:3")
for result in results:
    print(result.named['id'])
# Output: 1, 2, 3

For more examples and detailed usage, see the documentation.

Malformed patterns: parse vs compile

For some invalid patterns (for example a missing } after a field), parse returns None while compile raises PatternParseMismatch, a subclass of ValueError. Other syntax errors may still raise plain ValueError from both APIs. This matches the behavior of the original parse package.

Custom types (extra_types)

Map format-specifier names in your pattern to Python callables with the @with_pattern decorator. The type name after the colon in the field (for example Number in {:Number}) must match a key in the extra_types dict.

from formatparse import parse, with_pattern

@with_pattern(r"\d+")
def parse_int(text: str) -> int:
    return int(text)

result = parse("n={:Number}", "n=42", extra_types={"Number": parse_int})
assert result.fixed[0] == 42

If your regex uses capturing parentheses, set regex_group_count on @with_pattern so the engine can align groups correctly. Full examples, search / findall usage, and pitfalls are in the Custom types user guide.

Caching: parse, search, findall, and compile share an internal LRU cache keyed by the pattern string and a fingerprint of extra_types (each converter’s pattern and regex_group_count). Two dicts with the same keys and equivalent converters reuse the same compiled regex; changing a converter’s pattern without changing the dict identity can still reuse a stale cache entry—use a fresh dict or restart the process if you change patterns at runtime. See issue #29.

Pickling: A pickled FormatParser stores only the pattern string. After pickle.loads, pass extra_types again when calling parse / search / findall if your pattern uses custom types.

Performance

formatparse is significantly faster than the original Python parse library, with speedups ranging from 3x to 80x depending on the use case. The Rust backend provides:

  • Pattern caching to eliminate regex compilation overhead
  • Optimized type conversion paths for common types
  • Efficient memory management with pre-allocated data structures
  • Reduced Python GIL overhead through batched operations

For detailed benchmark results and performance analysis, see the documentation.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

For detailed contribution guidelines, including testing requirements and development setup, see CONTRIBUTING.md.

Testing

The project includes comprehensive test coverage:

  • Unit tests: ~710 Python tests (pytest tests/ --collect-only) and ~130 Rust tests (cargo test --workspace, formatparse-core + formatparse-pyo3)
  • Property-based tests: Hypothesis in tests/test_property.py and tests/test_fuzz.py
  • Performance Benchmarks: Automated regression testing
  • Stress Tests: Large input and scalability testing
  • Fuzz Tests: Crash-free input testing
  • Coverage: >90% code coverage target

Run tests with:

# All tests
pytest tests/

# With coverage
pytest tests/ --cov=formatparse --cov-report=html

# Benchmarks
pytest tests/test_performance.py --benchmark-only

See CONTRIBUTING.md for more testing information.

License

MIT License - see LICENSE file for details

Credits

Based on the parse library by Richard Jones.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formatparse-0.8.2.tar.gz (100.4 kB view details)

Uploaded Source

File details

Details for the file formatparse-0.8.2.tar.gz.

File metadata

  • Download URL: formatparse-0.8.2.tar.gz
  • Upload date:
  • Size: 100.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for formatparse-0.8.2.tar.gz
Algorithm Hash digest
SHA256 96ff0f1e9ad6d3647b2080e718d04031f0f7c88edb6e662be26a44f6640ad360
MD5 45e4916fdd2029ef76f444081fe1657f
BLAKE2b-256 9b18950a07e3533358f100c1722c3a6b02d1c7f375cc35564d905fc66b0cfbea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page