Skip to main content

Fast Python bindings for TOON format parser

Project description

toonpy

High-performance Python bindings for the TOON format parser, built with PyO3 and Rust.

5.82x faster than pure Python implementations, optimized for tabular data and LLM applications.


Features

  • Blazing Fast: 5.82x average speedup (2.98x - 9.68x range)
  • 🔧 Zero Dependencies: Pure PyO3/Rust implementation
  • 🎯 Optimized for Tabular Data: Inline primitive conversions for common patterns
  • 🔄 Async Support: Native asyncio integration via atoonpy module
  • 🐍 Python 3.8+: abi3 wheels for broad compatibility
  • 📦 Drop-in Replacement: Compatible API with other TOON libraries

Installation

pip install toonpy

Or build from source:

pip install maturin
maturin build --release
pip install target/wheels/toonpy-*.whl

Quick Start

Synchronous API

import toonpy

# Encode Python data to TOON
data = {"name": "Alice", "age": 30, "active": True}
toon_str = toonpy.encode(data)
# Output: 'active: true\nage: 30\nname: Alice\n'

# Decode TOON to Python
result = toonpy.decode(toon_str)
# Output: {'active': True, 'age': 30, 'name': 'Alice'}

# Batch operations
data_list = [{"id": i, "name": f"User{i}"} for i in range(100)]
toon_strs = toonpy.encode_batch(data_list)
results = toonpy.decode_batch(toon_strs)

Asynchronous API

import asyncio
import atoonpy

async def main():
    # Async encode/decode
    data = {"name": "Bob", "age": 25}
    toon_str = await atoonpy.encode(data)
    result = await atoonpy.decode(toon_str)
    
    # Concurrent batch operations
    data_list = [{"id": i} for i in range(1000)]
    toon_strs = await atoonpy.encode_batch(data_list)
    results = await atoonpy.decode_batch(toon_strs)

asyncio.run(main())

API Reference

Synchronous (toonpy)

encode(data, delimiter=None, strict=None) -> str

Encode Python data to TOON format string.

Parameters:

  • data: Python object (dict, list, str, int, float, bool, None)
  • delimiter: Optional delimiter ('comma', 'tab', 'pipe'). Default: 'comma'
  • strict: Optional strict mode. Default: False

Returns: TOON-formatted string

decode(toon_str, delimiter=None, strict=None) -> Any

Decode TOON format string to Python data.

Parameters:

  • toon_str: TOON-formatted string
  • delimiter: Optional delimiter hint ('comma', 'tab', 'pipe'). Auto-detected if not specified
  • strict: Optional strict mode. Default: False

Returns: Python object

encode_batch(data_list, delimiter=None, strict=None) -> list

Encode multiple Python objects.

decode_batch(toon_strs, delimiter=None, strict=None) -> list

Decode multiple TOON strings.

dumps(data, **kwargs) -> str

Alias for encode().

loads(toon_str, **kwargs) -> Any

Alias for decode().

Asynchronous (atoonpy)

All functions have the same signature as the sync API but return coroutines.

import atoonpy

# All functions are async
await atoonpy.encode(data)
await atoonpy.decode(toon_str)
await atoonpy.encode_batch(data_list)
await atoonpy.decode_batch(toon_strs)

Performance

Benchmark Results

Tested against toon-llm v1.0.0b6 (November 2025):

Test toonpy toon-llm Speedup
Small Object Decode 16.1 μs 94.7 μs 5.9x
Tabular Small Decode 46.0 μs 144.2 μs 3.1x
Tabular Large Decode (1k rows) 220.2 μs 905.9 μs 4.1x
Mixed Array Decode 21.1 μs 102.8 μs 4.9x
Small Object Encode 36.3 μs 278.1 μs 7.7x
Tabular Large Encode (1k rows) 325.4 μs 969.9 μs 3.0x

Average: 5.82x faster (range: 2.98x - 9.68x)

See PERFORMANCE.md for detailed analysis.


Architecture

Core Components

Rust Core (src/lib.rs)

  • PyO3 bindings for Python C API
  • Custom json_to_python() with inlined primitive conversions
  • Zero-copy operations where possible
  • Optimized for TOON's common patterns (tabular data)

Async Wrapper (python/atoonpy.py)

  • Pure Python asyncio wrapper
  • Uses asyncio.to_thread() to release GIL
  • Enables concurrent I/O operations

TOON Parser

  • Based on toon-rs by Jimmy Stridh
  • Features: SIMD string scanning (memchr), stack allocations (smallvec), fast float parsing

Optimization Techniques

  1. Inlined Primitive Conversions

    • 85% of TOON data is primitives in dicts/arrays
    • Avoid recursion overhead by inlining Null/Bool/Number/String conversions
    • Only recurse for nested structures
  2. Pre-allocated Collections

    let mut items = Vec::with_capacity(arr.len());
    Ok(PyList::new(py, items)?.into_any())
    
  3. Type-specific Fast Paths

    • .is_instance_of::<T>() for O(1) type checking
    • Direct conversions without dynamic dispatch
  4. SIMD Acceleration

    • memchr for string scanning (6.5x faster than stdlib)
    • AVX2 support on x86_64
  5. Link-time Optimization

    [profile.release]
    opt-level = 3
    lto = true
    codegen-units = 1
    

Dependencies

Production

  • pyo3 = "0.27" - Python bindings
  • serde_json = "1.0" - JSON handling
  • once_cell = "1.20" - Static defaults
  • smallvec = "1.13" - Stack allocations (transitive)
  • toon - TOON parser by Jimmy Stridh
    • perf_memchr - SIMD string scanning
    • perf_smallvec - Stack allocations
    • perf_lexical - Fast float parsing

Development

  • criterion = "0.5" - Micro-benchmarking

Building from Source

Requirements

  • Rust 1.70+
  • Python 3.8+
  • maturin

Build Steps

# Install maturin
pip install maturin

# Development build
maturin develop

# Release build
maturin build --release

# Install wheel
pip install target/wheels/toonpy-*.whl

# Run tests
python test_toonpy.py
python test_async.py

# Run benchmarks
python benchmark.py
cargo bench

Testing

# Unit tests
python test_toonpy.py

# Async tests
python test_async.py

# Benchmarks
python benchmark.py

# Micro-benchmarks
cargo bench

Credits

Core TOON Parser

Built on toon-rs by Jimmy Stridh.

The excellent TOON Rust implementation provides:

  • Fast TOON ↔ JSON conversion
  • SIMD-optimized string scanning
  • Efficient memory management
  • Robust error handling

toonpy Author

magi8101 (sharmamagi0@gmail.com)

Acknowledgments

  • PyO3 team for excellent Python-Rust bindings
  • TOON format creators for the readable data format
  • Rust community for performance-focused tools

License

MIT OR Apache-2.0


Related Projects

  • toon-rs - Rust TOON parser (core dependency)
  • toon-llm - Python TOON library with LLM features
  • toon-format - Official Python placeholder

Roadmap

  • PyO3 0.27 support
  • Async API via asyncio
  • Comprehensive benchmarking
  • Micro-optimization for tabular data
  • Streaming decoder for large files
  • Columnar output for pandas/polars
  • Python 3.13 free-threaded support

Contributing

Issues and PRs welcome! See PERFORMANCE.md for optimization internals.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toon_parser-0.1.0-cp38-abi3-win_amd64.whl (262.3 kB view details)

Uploaded CPython 3.8+Windows x86-64

File details

Details for the file toon_parser-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: toon_parser-0.1.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 262.3 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for toon_parser-0.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ab8f3505747012a8ac7cb934bba0cada313daa084a1a2838db346198569437f2
MD5 43fd9caca5d6d9f3fd9a4815de0dae83
BLAKE2b-256 ef53eb098c3791aa90beee8d9e217493e1b0124e2750a47fe0a3f750cfacc037

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page