Skip to main content

PyToon: Effortless, Token-Efficient Data for LLMs — Python’s First Production-Ready TOON Implementation

Project description

toon-codec

Token-Oriented Object Notation (TOON) codec for Python — 30-60% token savings over JSON for LLM applications

PyPI version Python 3.8+ License: MIT Test Coverage

toon-codec is a production-ready Python library implementing the TOON v1.5+ specification, providing significant token savings for structured LLM input/output through bidirectional JSON-TOON conversion.

Key Features

  • 30-60% Token Savings - Reduce LLM API costs with compact data representation
  • Full TOON v1.5+ Compliance - Complete specification implementation with strict validation
  • Roundtrip Fidelity - Guaranteed decode(encode(data)) == data for all valid inputs
  • Zero Dependencies - Core functionality requires no external packages
  • Type Safe - Full type hints with mypy strict mode compliance
  • Intelligent Format Selection - Auto-detect optimal serialization format
  • Advanced Features - Reference tracking, graph encoding, sparse arrays

Installation

pip install toon-codec

With optional token counting support (requires tiktoken):

pip install toon-codec[tokenizer]

Quick Start

from pytoon import encode, decode

# Basic encoding
data = {"name": "Alice", "age": 30, "active": True}
toon = encode(data)
print(toon)
# name: Alice
# age: 30
# active: true

# Decoding back to Python
recovered = decode(toon)
assert recovered == data  # Roundtrip guaranteed

Why TOON?

TOON (Token-Oriented Object Notation) is designed specifically for LLM applications where every token counts. Compare JSON vs TOON:

JSON (56 tokens)

[
  {"id": 1, "name": "Alice", "email": "alice@example.com"},
  {"id": 2, "name": "Bob", "email": "bob@example.com"},
  {"id": 3, "name": "Charlie", "email": "charlie@example.com"}
]

TOON (23 tokens) - 59% savings

[3]{id,name,email}
1,Alice,alice@example.com
2,Bob,bob@example.com
3,Charlie,charlie@example.com

Performance Metrics

Data Type JSON Tokens TOON Tokens Savings
Tabular data (uniform arrays) 100 35-45 55-65%
Nested objects 100 60-70 30-40%
Simple key-value 100 70-80 20-30%
Mixed structures 100 50-65 35-50%

Performance Characteristics:

  • Time Complexity: O(n) for encoding and decoding
  • Space Complexity: O(n) for output
  • Speed: <100ms for 1-10KB datasets
  • Validation Overhead: <5% in strict mode

Usage Examples

Tabular Data (Maximum Savings)

from pytoon import encode, decode

# List of uniform objects - ideal for TOON
users = [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob", "role": "user"},
    {"id": 3, "name": "Charlie", "role": "user"},
]

toon = encode(users)
print(toon)
# [3]{id,name,role}
# 1,Alice,admin
# 2,Bob,user
# 3,Charlie,user

# Decode back
assert decode(toon) == users

Nested Objects

config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "credentials": {
            "user": "admin",
            "password": "secret"
        }
    },
    "cache": {
        "enabled": True,
        "ttl": 3600
    }
}

toon = encode(config)
print(toon)
# database:
#   host: localhost
#   port: 5432
#   credentials:
#     user: admin
#     password: secret
# cache:
#   enabled: true
#   ttl: 3600

Intelligent Format Selection

from pytoon import smart_encode

data = [{"id": 1}, {"id": 2}, {"id": 3}]
encoded, decision = smart_encode(data)

print(f"Recommended: {decision.recommended_format}")
print(f"Confidence: {decision.confidence:.2f}")
print(f"Reasoning: {decision.reasoning}")
# Recommended: toon
# Confidence: 0.95
# Reasoning: ['High uniformity (100.0%) strongly favors TOON', ...]

Custom Type Support

from pytoon import encode, decode, register_type_handler
from datetime import datetime
import uuid

# Built-in support for common types
data = {
    "id": uuid.uuid4(),
    "created": datetime.now(),
    "tags": ["python", "llm", "efficiency"]
}

toon = encode(data)
recovered = decode(toon)
# Types are preserved through encoding/decoding

Reference Tracking (Relational Data)

from pytoon import encode_refs, decode_refs

# Shared object references
user = {"id": 1, "name": "Alice"}
data = {
    "author": user,
    "reviewer": user,  # Same object referenced twice
}

toon = encode_refs(data)
# Efficiently encodes shared references with $1, $2 placeholders

recovered = decode_refs(toon, resolve=True)
assert recovered["author"] is recovered["reviewer"]  # Same Python object

Graph Encoding (Circular References)

from pytoon import encode_graph, decode_graph

# Handle circular references
node1 = {"id": 1, "value": "A"}
node2 = {"id": 2, "value": "B"}
node1["next"] = node2
node2["next"] = node1  # Circular!

toon = encode_graph({"nodes": [node1, node2]})
recovered = decode_graph(toon)
# Circular structure preserved

Validation Modes

from pytoon import decode

# Strict mode (default) - raises on validation errors
try:
    decode("[5]: 1,2,3", strict=True)  # Declared 5, got 3
except Exception as e:
    print(f"Validation error: {e}")

# Lenient mode - best-effort parsing
result = decode("[5]: 1,2,3", strict=False)
# Returns [1, 2, 3] with warning

API Reference

Core Functions

# Encoding
encode(value, *, indent=2, delimiter=",", key_folding="off",
       ensure_ascii=False, sort_keys=False) -> str

# Decoding
decode(toon_string, *, strict=True, expand_paths="off") -> Any

# Intelligent encoding
smart_encode(value, *, auto=True, ...) -> tuple[str, FormatDecision]

Advanced Functions

# Reference support (v1.1)
encode_refs(data, mode="schema", ...) -> str
decode_refs(toon_string, resolve=True) -> Any

# Graph support (v1.2)
encode_graph(data, ...) -> str
decode_graph(toon_string) -> Any

# Type system
register_type_handler(type_class, handler)
get_type_registry() -> TypeRegistry

Exceptions

from pytoon import TOONError, TOONEncodeError, TOONDecodeError, TOONValidationError

# TOONError - Base exception
# TOONEncodeError - Encoding failures
# TOONDecodeError - Parsing failures
# TOONValidationError - Validation failures

Configuration Options

Parameter Type Default Description
indent int 2 Spaces per indentation level
delimiter str "," Field delimiter: ",", "\t", or "|"
key_folding str "off" Key folding: "off" or "safe"
ensure_ascii bool False Escape non-ASCII characters
sort_keys bool False Sort dictionary keys
strict bool True Enable strict validation

Architecture

toon-codec/
├── pytoon/                    # Main package (import as pytoon)
│   ├── core/                 # Core encoder/decoder
│   ├── encoder/              # Encoding components
│   ├── decoder/              # Decoding components
│   ├── decision/             # Intelligent format selection
│   ├── references/           # Reference & graph support
│   ├── types/                # Type system & handlers
│   ├── sparse/               # Sparse/polymorphic arrays
│   └── utils/                # Utilities & error handling
└── tests/                    # Comprehensive test suite

Roadmap

Current (v1.0.0)

  • Full TOON v1.5+ specification compliance
  • Bidirectional JSON-TOON conversion
  • Intelligent format selection (DecisionEngine)
  • Pluggable type system with 12 built-in handlers
  • Reference tracking and graph encoding
  • Sparse and polymorphic array support
  • CLI interface

Planned (v1.1 - v1.3)

  • v1.1: Enhanced CLI with --auto-decide, --explain, --debug flags
  • v1.2: Performance optimizations and streaming support
  • v1.3: Visual diff tools and enhanced error reporting

Future (v2.0+)

  • Streaming API - Process large datasets without full memory load
  • Hybrid Format - Automatically mix TOON and JSON for optimal results
  • Cython Acceleration - Optional C extensions for 10x speedup
  • Schema Validation - JSON Schema-like validation for TOON
  • Language Bindings - JavaScript, Rust, Go implementations

Use Cases

  • LLM API Cost Reduction - Save 30-60% on token costs
  • Structured Output Parsing - Efficiently parse LLM responses
  • Data Pipeline Optimization - Compact intermediate representations
  • Configuration Files - Human-readable, token-efficient configs
  • API Response Compression - Reduce bandwidth for structured data
  • Prompt Engineering - Fit more context in limited token windows

Contributing

Contributions are welcome! Please see our Contributing Guidelines.

# Development setup
git clone https://github.com/AetherForge/PyToon.git
cd PyToon
pip install -e ".[dev]"

# Run tests
pytest --cov=pytoon --cov-fail-under=85

# Type checking
mypy --strict pytoon/

# Linting
ruff check pytoon/
black pytoon/

Testing

The project includes comprehensive testing:

  • 1887+ tests with property-based testing (Hypothesis)
  • 85%+ code coverage enforced
  • Roundtrip fidelity verification
  • Specification compliance testing
  • Performance benchmarking
# Run all tests
pytest

# Run with coverage
pytest --cov=pytoon --cov-report=html

# Run specific test file
pytest tests/unit/test_encoder.py

License

MIT License - see LICENSE for details.

Links


Note: Install as pip install toon-codec, import as from pytoon import encode, decode

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toon_codec-1.0.1.tar.gz (86.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toon_codec-1.0.1-py3-none-any.whl (104.0 kB view details)

Uploaded Python 3

File details

Details for the file toon_codec-1.0.1.tar.gz.

File metadata

  • Download URL: toon_codec-1.0.1.tar.gz
  • Upload date:
  • Size: 86.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for toon_codec-1.0.1.tar.gz
Algorithm Hash digest
SHA256 336b7ecc5ef00d7957847b143b06da6a3cfe016012bd8a80a903406aec95eff8
MD5 6538614f0c81a7d610e71457e180fec9
BLAKE2b-256 2324736b69e136653e111519f348363852ce361600036cbe33bfdd6335237906

See more details on using hashes here.

File details

Details for the file toon_codec-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: toon_codec-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for toon_codec-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56ed74e7cc0aa2db98a96f681cda29f309c00dfe24abfb1643951913c8668334
MD5 20e62a3f77cc21e652a1b135844f4f48
BLAKE2b-256 c16be6f0fd82cba022c56a217101c95c18ef3216bcb54e6786f609dc54f2e05f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page