Token-Oriented Object Notation (TOON) is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. This specification outlines the requirements for porting the existing TypeScript implementation to Python.
Project description
TOON Python
Token-Oriented Object Notation (TOON) is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage. This Python implementation achieves 30-60% fewer tokens than equivalent JSON while maintaining full compatibility.
This project is a Python port of toon and is currently based on toon v0.3.1.
Features
- Token Efficiency: 30-60% reduction compared to JSON
- LLM-Friendly: Explicit lengths and field lists help models validate output
- Minimal Syntax: Removes redundant punctuation (braces, brackets, most quotes)
- Pythonic API: Simple, intuitive interface following Python conventions
- Type Support: Handles Python-specific types (datetime, Decimal, UUID, bytes)
- Flexible Formatting: Configurable indentation, delimiters, and length markers
- Pure Python: No runtime dependencies
Quick Start
from toon_python import encode, EncodeOptions, Delimiter
# Basic encoding
data = {
"user": {
"id": 123,
"name": "Ada Lovelace",
"active": True
},
"tags": ["python", "llm", "data"]
}
toon_output = encode(data)
print(toon_output)
Output:
user:
id: 123
name: Ada Lovelace
active: true
tags[3]: python,llm,data
Installation
pip install toon-python
Usage
Basic Encoding
from toon_python import encode
# Simple objects
encode({"name": "Alice", "age": 30})
# → name: Alice
# age: 30
# Arrays
encode({"items": [1, 2, 3]})
# → items[3]: 1,2,3
# Nested structures
encode({"user": {"id": 1, "roles": ["admin", "user"]}})
# → user:
# id: 1
# roles[2]: admin,user
Formatting Options
from toon_python import encode, EncodeOptions, Delimiter
options = EncodeOptions(
indent=4, # 4 spaces instead of 2
delimiter=Delimiter.PIPE, # Use | as delimiter
length_marker="#" # Add # prefix to array lengths
)
data = {"tags": ["a", "b", "c"]}
encode(data, options)
# → tags[#3|]: a|b|c
Type Support
The library automatically normalizes Python types to JSON-compatible representations:
from datetime import datetime, date
from decimal import Decimal
from uuid import UUID
data = {
"timestamp": datetime(2023, 1, 1, 12, 0, 0),
"date_only": date(2023, 1, 1),
"price": Decimal("19.99"),
"id": UUID("12345678-1234-5678-1234-567812345678"),
"binary": b"hello world"
}
encode(data)
# → timestamp: 2023-01-01T12:00:00
# date_only: 2023-01-01
# price: 19.99
# id: 12345678-1234-5678-1234-567812345678
# binary: aGVsbG8gd29ybGQ=
Array Optimization
TOON automatically chooses the best encoding strategy for arrays:
# Primitive arrays (inline)
encode({"numbers": [1, 2, 3, 4, 5]})
# → numbers[5]: 1,2,3,4,5
# Tabular arrays (uniform objects)
encode({"users": [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
]})
# → users[2]{id,name}:
# 1,Alice
# 2,Bob
# Mixed arrays (list format)
encode({"mixed": [1, {"a": 2}, "three"]})
# → mixed[3]:
# - 1
# - a: 2
# - three
API Reference
encode(data, options=None)
Convert any JSON-serializable value to TOON format.
Parameters:
data: Any Python data structure to encodeoptions: OptionalEncodeOptionsinstance for configuration
Returns: TOON format string
Raises:
ToonEncodingError: If data cannot be encodedCircularReferenceError: If circular references detectedDatasetTooLargeError: If data exceeds 10MB limit
EncodeOptions
Configuration options for TOON encoding:
@dataclass
class EncodeOptions:
indent: int = 2 # Spaces per nesting level
delimiter: Delimiter = Delimiter.COMMA # Array delimiter
length_marker: Optional[str] = None # '#' or None for array length prefix
Delimiter
Enum for array delimiters:
Delimiter.COMMA(default):,Delimiter.TAB:\tDelimiter.PIPE:|
Development
Setup
# Clone repository
git clone https://github.com/your-username/toon-python.git
cd toon-python
# Install development dependencies
pip install -e ".[dev]"
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=src/toon_python
# Run specific test file
pytest tests/test_encoder.py
Code Quality
# Format code
black src/ tests/
# Lint code
ruff check src/ tests/
# Type checking
mypy src/
Token Efficiency
TOON achieves significant token reduction through:
- Minimal Punctuation: Removes braces, brackets, and most quotes
- Smart Quoting: Only quotes when necessary for parsing
- Inline Arrays: Primitive arrays use comma-separated values
- Tabular Format: Uniform object arrays use table-like layout
- Compact Syntax: Eliminates redundant characters
Example comparison for a typical user object:
JSON (45 tokens):
{
"id": 123,
"name": "Alice Smith",
"email": "alice@example.com",
"active": true,
"roles": ["user", "admin"]
}
TOON (28 tokens, 38% reduction):
id: 123
name: Alice Smith
email: alice@example.com
active: true
roles[2]: user,admin
Limitations
- Maximum dataset size: 10MB (configurable)
- No circular reference support
- Pure Python implementation (not optimized for speed)
- Encoding only (no decoding functionality)
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite and ensure all tests pass
- Submit a pull request
License
MIT License - see LICENSE file for details.
Related Projects
- toon - Original TypeScript implementation (v0.3.1)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toon_python-0.1.2.tar.gz.
File metadata
- Download URL: toon_python-0.1.2.tar.gz
- Upload date:
- Size: 85.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10420d622c043cb5d9d444f92524731ded4b536fb77194c998855bcda12be3d2
|
|
| MD5 |
72031da17fc3d4932786f78ff8830095
|
|
| BLAKE2b-256 |
fcc6104805c4efa3853c875bd8d2f199e06d383cd307bbdc788f668c4178ce2d
|
File details
Details for the file toon_python-0.1.2-py3-none-any.whl.
File metadata
- Download URL: toon_python-0.1.2-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0582272edbb7b244c4931f80c679c2867c7d0a64faa53f16c355fcc3b8076d8f
|
|
| MD5 |
8197fb7b696aa68fe868aeee774096c7
|
|
| BLAKE2b-256 |
f869017f9f27f0e6b24871ecb73cdc675231c472d4b50a89be0b6779441862cd
|