Skip to main content

Token-Oriented Object Notation – Optimised serialization JSON for LLMs

Project description

Token-Oriented Object Notation (TOON) for LLMs

Token-Oriented Object Notation (TOON) is an LLM-optimized data serialization format implemented in Python.

Tests Coverage Python License

✨ Features

  • 🎯 LLM-optimized and Human-readable format: More compact and easier to read than JSON
  • 🐍 Python-native: Automatic handling of datetime, dataclasses, Pydantic models
  • 📊 Smart array formatting: Inline, tabular, or list formats chosen automatically
  • ⚙️ Configurable: Custom delimiters, indentation, and length markers
  • 🔒 Type-safe: Full type hints and Pydantic validation
  • 📝 Data Science Compatible: Compatible with JSON, Pandas and Pandas-like data tasks

Get more cognitive output and efficiency from LLMs with less tokens in prompts!

🚀 Quick Start

Installation

# Using uv (recommended)
uv add toon-llm

# Using pip
pip install toon-llm

Basic Usage

from toon import encode, decode

# Encode Python data to TOON LLM format
data = {
    "username": "Alice",
    "age": 30,
    "tags": ["python", "coding", "llm"],
    "active": True,
    "invoices": [
        {"id": 1, "amount": 250.75, "paid": False},
        {"id": 2, "amount": 125.00, "paid": True},
        {"id": 3, "amount": 320.40, "paid": True},
        {"id": 4, "amount": 75.20, "paid": False},
        {"id": 5, "amount": 600.00, "paid": True}
    ]
}

encoded = encode(data)

# username: Alice
# age: 30
# tags[3]: python,coding,llm
# active: true
# invoices[5]{id,amount,paid}:
#   1,250.75,false
#   2,125,true
#   3,320.40,true
#   4,75.20,false
#   5,600,true

llm_prompt = f"""
Process the following structured data and return the invoices that have not been paid:
```
{encoded}
```
"""

# Call your LLM with llm_prompt...

CLI Usage

TOON LLM includes a command-line interface for encoding and decoding data:

# Show help
uv run toon --help

# Encode JSON file to TOON format
uv run toon encode input.json -o output.toon

# Encode from stdin
echo '{"name": "Alice", "age": 30}' | uv run toon encode

# Decode TOON file to JSON
uv run toon decode input.toon -o output.json

# Decode with pretty printing
uv run toon decode input.toon --pretty

# Decode with validation
uv run toon decode input.toon --validate

# Custom formatting options
uv run toon encode input.json --indent 4 --delimiter "|"

# Show version
uv run toon --version

See uv run toon encode --help and uv run toon decode --help for all available options.

📖 Documentation

🎨 Why TOON LLM?

TOON LLM is a Python library that provides a clean, compact, and highly readable alternative to JSON for serializing Python data structures to minimise token usage with large language models (LLMs).

It is a Python compatible specification and implementation of Token-Oriented Object Notation format.

Cognitive load in LLMs can be significantly reduced by using more concise and structured data formats. TOON LLM achieves this by minimizing syntax noise and enhancing readability, making it easier for both humans and machines to parse and understand the data.

Compare with JSON

Using the cl100k_base tokenizer from OpenAI, here is a comparison of how the same data is represented in JSON vs TOON LLM.

JSON:

{
    "weather_observations": [
        { "high_temp": 75, "low_temp": 50, "average_temp": 62.5, "dew_point": 45, "wind_chill": 60 },
        { "high_temp": 78, "low_temp": 52, "average_temp": 65.0, "dew_point": 48, "wind_chill": 63 },
        { "high_temp": 72, "low_temp": 48, "average_temp": 60.0, "dew_point": 42, "wind_chill": 58 },
        { "high_temp": 80, "low_temp": 55, "average_temp": 67.5, "dew_point": 50, "wind_chill": 65 },
        { "high_temp": 76, "low_temp": 51, "average_temp": 63.5, "dew_point": 46, "wind_chill": 61 },
        { "high_temp": 74, "low_temp": 49, "average_temp": 61.5, "dew_point": 44, "wind_chill": 59 },
        { "high_temp": 79, "low_temp": 54, "average_temp": 66.5, "dew_point": 49, "wind_chill": 64 },
        { "high_temp": 73, "low_temp": 47, "average_temp": 60.0, "dew_point": 41, "wind_chill": 57 },
        { "high_temp": 77, "low_temp": 53, "average_temp": 65.0, "dew_point": 47, "wind_chill": 62 },
        { "high_temp": 81, "low_temp": 56, "average_temp": 68.5, "dew_point": 51, "wind_chill": 66 }
    ]
}

Token Count: 411

TOON LLM:

weather_observations[10]:
  high_temp,low_temp,average_temp,dew_point,wind_chill
  75,50,62.5,45,60
  78,52,65.0,48,63
  72,48,60.0,42,58
  80,55,67.5,50,65
  76,51,63.5,46,61
  74,49,61.5,44,59
  79,54,66.5,49,64
  73,47,60.0,41,57
  77,53,65.0,47,62
  81,56,68.5,51,66

Token Count: 162

That is over a 60% reduction in token count compared to JSON!

Multiply that over large datasets and complex structures, and the savings become substantial.

Benefits:

  • ✨ Less syntax noise (no braces, fewer quotes)
  • 📏 More compact (fewer lines and characters)
  • 👁️ Easier to read and scan
  • 🎯 Clear structure through indentation
  • 📊 Smart array formatting (inline, tabular, or list)

🛠️ Configuration

TOON LLM provides flexible configuration options to customize the encoding format.

Read about them in the Specification and the API Documentation.

🧪 Testing

# Run tests
uv run pytest tests/ -v

# Run with coverage
uv run coverage run -m pytest && uv run coverage report

# Current status
# 310 tests passing
# 80.52% coverage

🤝 Contributing

Contributions are welcome! Please read our Coding Standards before contributing.

Development Setup

# Clone repository
git clone https://github.com/davidpirogov/toon-llm.git
cd toon-llm

# Install dependencies
uv sync

# Run tests
uv run pytest

# Run linting
uv run ruff check src/toon/

# Format code
uv run ruff format src/toon/

Development Guidelines

  1. Follow PEP 8 and our Coding Standards
  2. Add tests for new features
  3. Update documentation
  4. Ensure all tests pass
  5. Maintain or improve coverage

📋 Requirements

  • Python 3.11 or higher
  • Pydantic 2.x

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Inspired by Token-Oriented Object Notation by Johann Schopplich.

If you are looking for a TypeScript/JavaScript implementation, check out toon repository

🔗 Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toon_llm-1.0.0b6.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toon_llm-1.0.0b6-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file toon_llm-1.0.0b6.tar.gz.

File metadata

  • Download URL: toon_llm-1.0.0b6.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for toon_llm-1.0.0b6.tar.gz
Algorithm Hash digest
SHA256 ae29bc3dc356be0e60e1dd9ca8931ad465c30ade46364fc2f792d11d6157d3c3
MD5 21be7f69c2f137f5b50eeeb090e4654f
BLAKE2b-256 df7b689dd6efc20309ab431fcb1a8582d7627ce9069c47f523c3bf8cc4d20418

See more details on using hashes here.

File details

Details for the file toon_llm-1.0.0b6-py3-none-any.whl.

File metadata

  • Download URL: toon_llm-1.0.0b6-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for toon_llm-1.0.0b6-py3-none-any.whl
Algorithm Hash digest
SHA256 4646ea1c675f4486e2bf1f619220de46d15592cb489f21f5bb7c250329afa3c2
MD5 c43a4a92ede482448f51b9b1546a9416
BLAKE2b-256 222988d802795a5a85582713e9e61b6a3b549c606b3bf47313d32a7a1a5ea314

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page