TOON (Token-Oriented Object Notation) - A compact, human-readable serialization format for LLMs
Project description
TOON (Token-Oriented Object Notation)
A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.
Overview
TOON achieves CSV-like compactness while adding explicit structure, making it ideal for:
- Reducing token costs in LLM API calls
- Improving context window efficiency
- Maintaining human readability
- Preserving data structure and types
Key Features
- ✅ Compact: 30-60% smaller than JSON for structured data
- ✅ Readable: Clean, indentation-based syntax
- ✅ Structured: Preserves nested objects and arrays
- ✅ Type-safe: Supports strings, numbers, booleans, null
- ✅ Flexible: Multiple delimiter options (comma, tab, pipe)
- ✅ Smart: Automatic tabular format for uniform arrays
- ✅ Efficient: Key folding for deeply nested objects
Installation
pip install toonify
For development:
pip install toonify[dev]
Quick Start
Python API
from toon import encode, decode
# Encode Python dict to TOON
data = {
'products': [
{'sku': 'LAP-001', 'name': 'Gaming Laptop', 'price': 1299.99},
{'sku': 'MOU-042', 'name': 'Wireless Mouse', 'price': 29.99}
]
}
toon_string = encode(data)
print(toon_string)
# Output:
# products[2]{sku,name,price}:
# LAP-001,Gaming Laptop,1299.99
# MOU-042,Wireless Mouse,29.99
# Decode TOON back to Python
result = decode(toon_string)
assert result == data
Command Line
# Encode JSON to TOON
toon input.json -o output.toon
# Decode TOON to JSON
toon input.toon -o output.json
# Use with pipes
cat data.json | toon -e > data.toon
# Show token statistics
toon data.json --stats
TOON Format Specification
Basic Syntax
# Simple key-value pairs
title: Machine Learning Basics
chapters: 12
published: true
Arrays
Primitive arrays (inline):
temperatures: [72.5,68.3,75.1,70.8,73.2]
categories: [electronics,computers,accessories]
Tabular arrays (uniform objects with header):
inventory[3]{sku,product,stock}:
KB-789,Mechanical Keyboard,45
MS-456,RGB Mouse Pad,128
HD-234,USB Headset,67
List arrays (non-uniform or nested):
tasks[2]:
Complete documentation
Review pull requests
Nested Objects
server:
hostname: api-prod-01
config:
port: 8080
region: us-east
Quoting Rules
Strings are quoted only when necessary:
- Contains special characters (
,,:,", newlines) - Has leading/trailing whitespace
- Looks like a literal (
true,false,null) - Is empty
simple: ProductName
quoted: "Product, Description"
escaped: "Size: 15\" display"
multiline: "First feature\nSecond feature"
API Reference
encode(data, options=None)
Convert Python object to TOON string.
Parameters:
data: Python dict or listoptions: Optional dict with:delimiter:'comma'(default),'tab', or'pipe'indent: Number of spaces per level (default: 2)key_folding:'off'(default) or'safe'flatten_depth: Max depth for key folding (default: None)
Example:
toon = encode(data, {
'delimiter': 'tab',
'indent': 4,
'key_folding': 'safe'
})
decode(toon_string, options=None)
Convert TOON string to Python object.
Parameters:
toon_string: TOON formatted stringoptions: Optional dict with:strict: Validate structure strictly (default: True)expand_paths:'off'(default) or'safe'default_delimiter: Default delimiter (default:',')
Example:
data = decode(toon_string, {
'expand_paths': 'safe',
'strict': False
})
CLI Usage
usage: toon [-h] [-o OUTPUT] [-e] [-d] [--delimiter {comma,tab,pipe}]
[--indent INDENT] [--stats] [--no-strict]
[--key-folding {off,safe}] [--flatten-depth DEPTH]
[--expand-paths {off,safe}]
[input]
TOON (Token-Oriented Object Notation) - Convert between JSON and TOON formats
positional arguments:
input Input file path (or "-" for stdin)
optional arguments:
-h, --help show this help message and exit
-o, --output OUTPUT Output file path (default: stdout)
-e, --encode Force encode mode (JSON to TOON)
-d, --decode Force decode mode (TOON to JSON)
--delimiter {comma,tab,pipe}
Array delimiter (default: comma)
--indent INDENT Indentation size (default: 2)
--stats Show token statistics
--no-strict Disable strict validation (decode only)
--key-folding {off,safe}
Key folding mode (encode only)
--flatten-depth DEPTH Maximum key folding depth (encode only)
--expand-paths {off,safe}
Path expansion mode (decode only)
Advanced Features
Key Folding
Collapse single-key chains into dotted paths:
data = {
'api': {
'response': {
'product': {
'title': 'Wireless Keyboard'
}
}
}
}
# With key_folding='safe'
toon = encode(data, {'key_folding': 'safe'})
# Output: api.response.product.title: Wireless Keyboard
Path Expansion
Expand dotted keys into nested objects:
toon = 'store.location.zipcode: 10001'
# With expand_paths='safe'
data = decode(toon, {'expand_paths': 'safe'})
# Result: {'store': {'location': {'zipcode': 10001}}}
Custom Delimiters
Choose the delimiter that best fits your data:
# Tab delimiter (better for spreadsheet-like data)
toon = encode(data, {'delimiter': 'tab'})
# Pipe delimiter (when data contains commas)
toon = encode(data, {'delimiter': 'pipe'})
Format Comparison
JSON vs TOON
JSON (247 bytes):
{
"products": [
{"id": 101, "name": "Laptop Pro", "price": 1299},
{"id": 102, "name": "Magic Mouse", "price": 79},
{"id": 103, "name": "USB-C Cable", "price": 19}
]
}
TOON (98 bytes, 60% reduction):
products[3]{id,name,price}:
101,Laptop Pro,1299
102,Magic Mouse,79
103,USB-C Cable,19
When to Use TOON
Use TOON when:
- ✅ Passing data to LLM APIs (reduce token costs)
- ✅ Working with uniform tabular data
- ✅ Context window is limited
- ✅ Human readability matters
Use JSON when:
- ❌ Maximum compatibility is required
- ❌ Data is highly irregular/nested
- ❌ Working with existing JSON-only tools
Development
Setup
git clone https://github.com/ScrapeGraphAI/toonify.git
cd toonify
pip install -e .[dev]
Running Tests
pytest
pytest --cov=toon --cov-report=term-missing
Running Examples
python examples/basic_usage.py
python examples/advanced_features.py
Performance
TOON typically achieves:
- 30-60% size reduction vs JSON for structured data
- 40-70% token reduction with tabular data
- Minimal overhead in encoding/decoding (<1ms for typical payloads)
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with tests
- Run tests (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
MIT License - see LICENSE file for details.
Credits
Python implementation inspired by the TypeScript TOON library at toon-format/toon.
Links
- GitHub: https://github.com/ScrapeGraphAI/toonify
- PyPI: https://pypi.org/project/toonify/
- Documentation: https://github.com/ScrapeGraphAI/toonify#readme
- Format Spec: https://github.com/toon-format/toon
Made with love by the ScrapeGraph team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toonify-1.1.1.tar.gz.
File metadata
- Download URL: toonify-1.1.1.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e84cdf53b34b1598d89ed899aac83c2c79ed5198c33df47158092bd65920d99
|
|
| MD5 |
3441b108c032bd11b05f557aff4801d3
|
|
| BLAKE2b-256 |
4a75859593a387b01cea1a525a5ce2c5b8860b213077a9a6c7faf7fa03866510
|
File details
Details for the file toonify-1.1.1-py3-none-any.whl.
File metadata
- Download URL: toonify-1.1.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0ab825f364df194514604418747d6c7cf2d0dd297d6666e4fcdfd2f5056c94c
|
|
| MD5 |
09e9f0f973a96c88ecdd6637981dccd6
|
|
| BLAKE2b-256 |
e53da725a99d9fd0b9a580ca852a4324c26123a61bd5dac11cf5ca185cb6aee9
|