Token-Oriented Object Notation – Optimised serialization JSON for LLMs
Project description
Token-Oriented Object Notation (TOON) for LLMs
Token-Oriented Object Notation (TOON) is an LLM-optimized data serialization format implemented in Python.
✨ Features
- 🎯 LLM-optimized and Human-readable format: More compact and easier to read than JSON
- 🐍 Python-native: Automatic handling of datetime, dataclasses, Pydantic models
- 📊 Smart array formatting: Inline, tabular, or list formats chosen automatically
- ⚙️ Configurable: Custom delimiters, indentation, and length markers
- 🔒 Type-safe: Full type hints and Pydantic validation
- 📝 Data Science Compatible: Compatible with JSON, Pandas and Pandas-like data tasks
Get more cognitive output and efficiency from LLMs with less tokens in prompts!
🚀 Quick Start
Installation
# Using uv (recommended)
uv add toon-llm
# Using pip
pip install toon-llm
Basic Usage
from toon import encode, decode
# Encode Python data to TOON LLM format
data = {
"username": "Alice",
"age": 30,
"tags": ["python", "coding", "llm"],
"active": True,
"invoices": [
{"id": 1, "amount": 250.75, "paid": False},
{"id": 2, "amount": 125.00, "paid": True},
{"id": 3, "amount": 320.40, "paid": True},
{"id": 4, "amount": 75.20, "paid": False},
{"id": 5, "amount": 600.00, "paid": True}
]
}
encoded = encode(data)
# username: Alice
# age: 30
# tags[3]: python,coding,llm
# active: true
# invoices[5]{id,amount,paid}:
# 1,250.75,false
# 2,125,true
# 3,320.40,true
# 4,75.20,false
# 5,600,true
llm_prompt = f"""
Process the following structured data and return the invoices that have not been paid:
```
{encoded}
```
"""
# Call your LLM with llm_prompt...
CLI Usage
TOON LLM includes a command-line interface for encoding and decoding data:
# Show help
uv run toon --help
# Encode JSON file to TOON format
uv run toon encode input.json -o output.toon
# Encode from stdin
echo '{"name": "Alice", "age": 30}' | uv run toon encode
# Decode TOON file to JSON
uv run toon decode input.toon -o output.json
# Decode with pretty printing
uv run toon decode input.toon --pretty
# Decode with validation
uv run toon decode input.toon --validate
# Custom formatting options
uv run toon encode input.json --indent 4 --delimiter "|"
# Show version
uv run toon --version
See uv run toon encode --help and uv run toon decode --help for all available options.
📖 Documentation
- Quick Start Guide - Examples and usage overview
- Format Specification - Token Oriented Object Notation (TOON) specification (language agnostic)
- API Reference - Complete API documentation of the Python implementation
- LLM Prompts - Guidance for LLMs to understand and generate TOON format
- Coding Standards - For contributors
🎨 Why TOON LLM?
TOON LLM is a Python library that provides a clean, compact, and highly readable alternative to JSON for serializing Python data structures to minimise token usage with large language models (LLMs).
It is a Python compatible specification and implementation of Token-Oriented Object Notation format.
Cognitive load in LLMs can be significantly reduced by using more concise and structured data formats. TOON LLM achieves this by minimizing syntax noise and enhancing readability, making it easier for both humans and machines to parse and understand the data.
Compare with JSON
Using the cl100k_base tokenizer from OpenAI, here is a comparison of how the same data is represented in JSON vs TOON LLM.
JSON:
{
"weather_observations": [
{ "high_temp": 75, "low_temp": 50, "average_temp": 62.5, "dew_point": 45, "wind_chill": 60 },
{ "high_temp": 78, "low_temp": 52, "average_temp": 65.0, "dew_point": 48, "wind_chill": 63 },
{ "high_temp": 72, "low_temp": 48, "average_temp": 60.0, "dew_point": 42, "wind_chill": 58 },
{ "high_temp": 80, "low_temp": 55, "average_temp": 67.5, "dew_point": 50, "wind_chill": 65 },
{ "high_temp": 76, "low_temp": 51, "average_temp": 63.5, "dew_point": 46, "wind_chill": 61 },
{ "high_temp": 74, "low_temp": 49, "average_temp": 61.5, "dew_point": 44, "wind_chill": 59 },
{ "high_temp": 79, "low_temp": 54, "average_temp": 66.5, "dew_point": 49, "wind_chill": 64 },
{ "high_temp": 73, "low_temp": 47, "average_temp": 60.0, "dew_point": 41, "wind_chill": 57 },
{ "high_temp": 77, "low_temp": 53, "average_temp": 65.0, "dew_point": 47, "wind_chill": 62 },
{ "high_temp": 81, "low_temp": 56, "average_temp": 68.5, "dew_point": 51, "wind_chill": 66 }
]
}
Token Count: 411
TOON LLM:
weather_observations[10]:
high_temp,low_temp,average_temp,dew_point,wind_chill
75,50,62.5,45,60
78,52,65.0,48,63
72,48,60.0,42,58
80,55,67.5,50,65
76,51,63.5,46,61
74,49,61.5,44,59
79,54,66.5,49,64
73,47,60.0,41,57
77,53,65.0,47,62
81,56,68.5,51,66
Token Count: 162
That is over a 60% reduction in token count compared to JSON!
Multiply that over large datasets and complex structures, and the savings become substantial.
Benefits:
- ✨ Less syntax noise (no braces, fewer quotes)
- 📏 More compact (fewer lines and characters)
- 👁️ Easier to read and scan
- 🎯 Clear structure through indentation
- 📊 Smart array formatting (inline, tabular, or list)
🛠️ Configuration
TOON LLM provides flexible configuration options to customize the encoding format.
Read about them in the Specification and the API Documentation.
🧪 Testing
# Run tests
uv run pytest tests/ -v
# Run with coverage
uv run coverage run -m pytest && uv run coverage report
# Current status
# 310 tests passing
# 80.52% coverage
🤝 Contributing
Contributions are welcome! Please read our Coding Standards before contributing.
Development Setup
# Clone repository
git clone https://github.com/davidpirogov/toon-llm.git
cd toon-llm
# Install dependencies
uv sync
# Run tests
uv run pytest
# Run linting
uv run ruff check src/toon/
# Format code
uv run ruff format src/toon/
Development Guidelines
- Follow PEP 8 and our Coding Standards
- Add tests for new features
- Update documentation
- Ensure all tests pass
- Maintain or improve coverage
📋 Requirements
- Python 3.11 or higher
- Pydantic 2.x
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
Inspired by Token-Oriented Object Notation by Johann Schopplich.
If you are looking for a TypeScript/JavaScript implementation, check out toon repository
🔗 Links
- GitHub: https://github.com/davidpirogov/toon-llm
- Documentation: ./docs/
- Issues: GitHub Issues
- PyPI: https://pypi.org/project/toon-llm/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toon_llm-1.0.0b6.tar.gz.
File metadata
- Download URL: toon_llm-1.0.0b6.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae29bc3dc356be0e60e1dd9ca8931ad465c30ade46364fc2f792d11d6157d3c3
|
|
| MD5 |
21be7f69c2f137f5b50eeeb090e4654f
|
|
| BLAKE2b-256 |
df7b689dd6efc20309ab431fcb1a8582d7627ce9069c47f523c3bf8cc4d20418
|
File details
Details for the file toon_llm-1.0.0b6-py3-none-any.whl.
File metadata
- Download URL: toon_llm-1.0.0b6-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4646ea1c675f4486e2bf1f619220de46d15592cb489f21f5bb7c250329afa3c2
|
|
| MD5 |
c43a4a92ede482448f51b9b1546a9416
|
|
| BLAKE2b-256 |
222988d802795a5a85582713e9e61b6a3b549c606b3bf47313d32a7a1a5ea314
|