Skip to main content

Token-Oriented Object Notation (TOON) is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. This specification outlines the requirements for porting the existing TypeScript implementation to Python.

Project description

TOON Python

Token-Oriented Object Notation (TOON) is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. This Python implementation achieves 30-60% fewer tokens than equivalent JSON while maintaining full compatibility.

This project is a Python port of toon and is currently based on toon v0.3.1.

Features

  • Token Efficiency: 30-60% reduction compared to JSON
  • LLM-Friendly: Explicit lengths and field lists help models validate output
  • Minimal Syntax: Removes redundant punctuation (braces, brackets, most quotes)
  • Pythonic API: Simple, intuitive interface following Python conventions
  • Type Support: Handles Python-specific types (datetime, Decimal, UUID, bytes)
  • Flexible Formatting: Configurable indentation, delimiters, and length markers
  • Pure Python: No runtime dependencies

Quick Start

from toon_python import encode, EncodeOptions, Delimiter

# Basic encoding
data = {
    "user": {
        "id": 123,
        "name": "Ada Lovelace",
        "active": True
    },
    "tags": ["python", "llm", "data"]
}

toon_output = encode(data)
print(toon_output)

Output:

user:
  id: 123
  name: Ada Lovelace
  active: true
tags[3]: python,llm,data

Installation

pip install toon-python

Usage

Basic Encoding

from toon_python import encode

# Simple objects
encode({"name": "Alice", "age": 30})
# → name: Alice
#   age: 30

# Arrays
encode({"items": [1, 2, 3]})
# → items[3]: 1,2,3

# Nested structures
encode({"user": {"id": 1, "roles": ["admin", "user"]}})
# → user:
#     id: 1
#     roles[2]: admin,user

Formatting Options

from toon_python import encode, EncodeOptions, Delimiter

options = EncodeOptions(
    indent=4,                    # 4 spaces instead of 2
    delimiter=Delimiter.PIPE,    # Use | as delimiter
    length_marker="#"            # Add # prefix to array lengths
)

data = {"tags": ["a", "b", "c"]}
encode(data, options)
# → tags[#3|]: a|b|c

Type Support

The library automatically normalizes Python types to JSON-compatible representations:

from datetime import datetime, date
from decimal import Decimal
from uuid import UUID

data = {
    "timestamp": datetime(2023, 1, 1, 12, 0, 0),
    "date_only": date(2023, 1, 1),
    "price": Decimal("19.99"),
    "id": UUID("12345678-1234-5678-1234-567812345678"),
    "binary": b"hello world"
}

encode(data)
# → timestamp: 2023-01-01T12:00:00
#   date_only: 2023-01-01
#   price: 19.99
#   id: 12345678-1234-5678-1234-567812345678
#   binary: aGVsbG8gd29ybGQ=

Array Optimization

TOON automatically chooses the best encoding strategy for arrays:

# Primitive arrays (inline)
encode({"numbers": [1, 2, 3, 4, 5]})
# → numbers[5]: 1,2,3,4,5

# Tabular arrays (uniform objects)
encode({"users": [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
]})
# → users[2]{id,name}:
#     1,Alice
#     2,Bob

# Mixed arrays (list format)
encode({"mixed": [1, {"a": 2}, "three"]})
# → mixed[3]:
#     - 1
#     - a: 2
#     - three

API Reference

encode(data, options=None)

Convert any JSON-serializable value to TOON format.

Parameters:

  • data: Any Python data structure to encode
  • options: Optional EncodeOptions instance for configuration

Returns: TOON format string

Raises:

  • ToonEncodingError: If data cannot be encoded
  • CircularReferenceError: If circular references detected
  • DatasetTooLargeError: If data exceeds 10MB limit

EncodeOptions

Configuration options for TOON encoding:

@dataclass
class EncodeOptions:
    indent: int = 2                    # Spaces per nesting level
    delimiter: Delimiter = Delimiter.COMMA  # Array delimiter
    length_marker: Optional[str] = None  # '#' or None for array length prefix

Delimiter

Enum for array delimiters:

  • Delimiter.COMMA (default): ,
  • Delimiter.TAB: \t
  • Delimiter.PIPE: |

Development

Setup

# Clone repository
git clone https://github.com/your-username/toon-python.git
cd toon-python

# Install development dependencies
pip install -e ".[dev]"

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src/toon_python

# Run specific test file
pytest tests/test_encoder.py

Code Quality

# Format code
black src/ tests/

# Lint code
ruff check src/ tests/

# Type checking
mypy src/

Token Efficiency

TOON achieves significant token reduction through:

  1. Minimal Punctuation: Removes braces, brackets, and most quotes
  2. Smart Quoting: Only quotes when necessary for parsing
  3. Inline Arrays: Primitive arrays use comma-separated values
  4. Tabular Format: Uniform object arrays use table-like layout
  5. Compact Syntax: Eliminates redundant characters

Example comparison for a typical user object:

JSON (45 tokens):

{
  "id": 123,
  "name": "Alice Smith",
  "email": "alice@example.com",
  "active": true,
  "roles": ["user", "admin"]
}

TOON (28 tokens, 38% reduction):

id: 123
name: Alice Smith
email: alice@example.com
active: true
roles[2]: user,admin

Limitations

  • Maximum dataset size: 10MB (configurable)
  • No circular reference support
  • Pure Python implementation (not optimized for speed)
  • Encoding only (no decoding functionality)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Run the test suite and ensure all tests pass
  6. Submit a pull request

License

MIT License - see LICENSE file for details.

Related Projects

  • toon - Original TypeScript implementation (v0.3.1)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toon_python-0.1.2.tar.gz (85.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toon_python-0.1.2-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file toon_python-0.1.2.tar.gz.

File metadata

  • Download URL: toon_python-0.1.2.tar.gz
  • Upload date:
  • Size: 85.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for toon_python-0.1.2.tar.gz
Algorithm Hash digest
SHA256 10420d622c043cb5d9d444f92524731ded4b536fb77194c998855bcda12be3d2
MD5 72031da17fc3d4932786f78ff8830095
BLAKE2b-256 fcc6104805c4efa3853c875bd8d2f199e06d383cd307bbdc788f668c4178ce2d

See more details on using hashes here.

File details

Details for the file toon_python-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for toon_python-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0582272edbb7b244c4931f80c679c2867c7d0a64faa53f16c355fcc3b8076d8f
MD5 8197fb7b696aa68fe868aeee774096c7
BLAKE2b-256 f869017f9f27f0e6b24871ecb73cdc675231c472d4b50a89be0b6779441862cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page