Skip to main content

TOON (Token-Oriented Object Notation) - A compact, human-readable serialization format for LLMs

Project description

Toonify Logo

TOON (Token-Oriented Object Notation)

English | 中文 | 한국어

A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.

Python Version License: MIT

Overview

TOON achieves CSV-like compactness while adding explicit structure, making it ideal for:

  • Reducing token costs in LLM API calls
  • Improving context window efficiency
  • Maintaining human readability
  • Preserving data structure and types

Key Features

  • Compact: 64% smaller than JSON on average (tested on 50 datasets)
  • Readable: Clean, indentation-based syntax
  • Structured: Preserves nested objects and arrays
  • Type-safe: Supports strings, numbers, booleans, null
  • Flexible: Multiple delimiter options (comma, tab, pipe)
  • Smart: Automatic tabular format for uniform arrays
  • Efficient: Key folding for deeply nested objects

Installation

pip install toonify

For development:

pip install toonify[dev]

Quick Start

Python API

from toon import encode, decode

# Encode Python dict to TOON
data = {
    'products': [
        {'sku': 'LAP-001', 'name': 'Gaming Laptop', 'price': 1299.99},
        {'sku': 'MOU-042', 'name': 'Wireless Mouse', 'price': 29.99}
    ]
}

toon_string = encode(data)
print(toon_string)
# Output:
# products[2]{sku,name,price}:
#   LAP-001,Gaming Laptop,1299.99
#   MOU-042,Wireless Mouse,29.99

# Decode TOON back to Python
result = decode(toon_string)
assert result == data

Command Line

# Encode JSON to TOON
toon input.json -o output.toon

# Decode TOON to JSON
toon input.toon -o output.json

# Use with pipes
cat data.json | toon -e > data.toon

# Show token statistics
toon data.json --stats

TOON Format Specification

Basic Syntax

# Simple key-value pairs
title: Machine Learning Basics
chapters: 12
published: true

Arrays

Primitive arrays (inline):

temperatures: [72.5,68.3,75.1,70.8,73.2]
categories: [electronics,computers,accessories]

Tabular arrays (uniform objects with header):

inventory[3]{sku,product,stock}:
  KB-789,Mechanical Keyboard,45
  MS-456,RGB Mouse Pad,128
  HD-234,USB Headset,67

List arrays (non-uniform or nested):

tasks[2]:
  Complete documentation
  Review pull requests

Nested Objects

server:
  hostname: api-prod-01
  config:
    port: 8080
    region: us-east

Quoting Rules

Strings are quoted only when necessary:

  • Contains special characters (,, :, ", newlines)
  • Has leading/trailing whitespace
  • Looks like a literal (true, false, null)
  • Is empty
simple: ProductName
quoted: "Product, Description"
escaped: "Size: 15\" display"
multiline: "First feature\nSecond feature"

API Reference

encode(data, options=None)

Convert Python object to TOON string.

Parameters:

  • data: Python dict or list
  • options: Optional dict with:
    • delimiter: 'comma' (default), 'tab', or 'pipe'
    • indent: Number of spaces per level (default: 2)
    • key_folding: 'off' (default) or 'safe'
    • flatten_depth: Max depth for key folding (default: None)

Example:

toon = encode(data, {
    'delimiter': 'tab',
    'indent': 4,
    'key_folding': 'safe'
})

decode(toon_string, options=None)

Convert TOON string to Python object.

Parameters:

  • toon_string: TOON formatted string
  • options: Optional dict with:
    • strict: Validate structure strictly (default: True)
    • expand_paths: 'off' (default) or 'safe'
    • default_delimiter: Default delimiter (default: ',')

Example:

data = decode(toon_string, {
    'expand_paths': 'safe',
    'strict': False
})

CLI Usage

usage: toon [-h] [-o OUTPUT] [-e] [-d] [--delimiter {comma,tab,pipe}]
            [--indent INDENT] [--stats] [--no-strict]
            [--key-folding {off,safe}] [--flatten-depth DEPTH]
            [--expand-paths {off,safe}]
            [input]

TOON (Token-Oriented Object Notation) - Convert between JSON and TOON formats

positional arguments:
  input                 Input file path (or "-" for stdin)

optional arguments:
  -h, --help            show this help message and exit
  -o, --output OUTPUT   Output file path (default: stdout)
  -e, --encode          Force encode mode (JSON to TOON)
  -d, --decode          Force decode mode (TOON to JSON)
  --delimiter {comma,tab,pipe}
                        Array delimiter (default: comma)
  --indent INDENT       Indentation size (default: 2)
  --stats               Show token statistics
  --no-strict           Disable strict validation (decode only)
  --key-folding {off,safe}
                        Key folding mode (encode only)
  --flatten-depth DEPTH Maximum key folding depth (encode only)
  --expand-paths {off,safe}
                        Path expansion mode (decode only)

Advanced Features

Key Folding

Collapse single-key chains into dotted paths:

data = {
    'api': {
        'response': {
            'product': {
                'title': 'Wireless Keyboard'
            }
        }
    }
}

# With key_folding='safe'
toon = encode(data, {'key_folding': 'safe'})
# Output: api.response.product.title: Wireless Keyboard

Path Expansion

Expand dotted keys into nested objects:

toon = 'store.location.zipcode: 10001'

# With expand_paths='safe'
data = decode(toon, {'expand_paths': 'safe'})
# Result: {'store': {'location': {'zipcode': 10001}}}

Custom Delimiters

Choose the delimiter that best fits your data:

# Tab delimiter (better for spreadsheet-like data)
toon = encode(data, {'delimiter': 'tab'})

# Pipe delimiter (when data contains commas)
toon = encode(data, {'delimiter': 'pipe'})

Format Comparison

JSON vs TOON

JSON (247 bytes):

{
  "products": [
    {"id": 101, "name": "Laptop Pro", "price": 1299},
    {"id": 102, "name": "Magic Mouse", "price": 79},
    {"id": 103, "name": "USB-C Cable", "price": 19}
  ]
}

TOON (98 bytes, 60% reduction):

products[3]{id,name,price}:
  101,Laptop Pro,1299
  102,Magic Mouse,79
  103,USB-C Cable,19

When to Use TOON

Use TOON when:

  • ✅ Passing data to LLM APIs (reduce token costs)
  • ✅ Working with uniform tabular data
  • ✅ Context window is limited
  • ✅ Human readability matters

Use JSON when:

  • ❌ Maximum compatibility is required
  • ❌ Data is highly irregular/nested
  • ❌ Working with existing JSON-only tools

Development

Setup

git clone https://github.com/ScrapeGraphAI/toonify.git
cd toonify
pip install -e .[dev]

Running Tests

pytest
pytest --cov=toon --cov-report=term-missing

Running Examples

python examples/basic_usage.py
python examples/advanced_features.py

Performance

Benchmarked across 50 diverse, real-world datasets:

  • 63.9% average size reduction vs JSON for structured data
  • 54.1% average token reduction (directly lowers LLM API costs)
  • Up to 73.4% savings for optimal use cases (tabular data, surveys, analytics)
  • 98% of datasets achieve 40%+ savings
  • Minimal overhead in encoding/decoding (<1ms for typical payloads)

💰 Cost Impact: At GPT-4 pricing, TOON saves $2,147 per million API requests and $5,408 per billion tokens.

📊 View Full Benchmark Results →

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes with tests
  4. Run tests (pytest)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

MIT License - see LICENSE file for details.

Credits

Python implementation inspired by the TypeScript TOON library at toon-format/toon.

Links


Made with love by the ScrapeGraph team

ScrapeGraphAI Logo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toonify-1.2.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toonify-1.2.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file toonify-1.2.0.tar.gz.

File metadata

  • Download URL: toonify-1.2.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for toonify-1.2.0.tar.gz
Algorithm Hash digest
SHA256 ab87331f0bc56d41db12df75b8c2c772ba5ec371d9801a5941a884c6be469e64
MD5 9346b6462b2857194e810a621f7f1375
BLAKE2b-256 449f959cfb74e9a63df836d7e094577dacdfa20251d4d45ddebe6c5d24bb2402

See more details on using hashes here.

File details

Details for the file toonify-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: toonify-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for toonify-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ff3922912604ddb6f3051401e56c2f1b4ea9664577addbc213d7e991031ff0e
MD5 aca195f1423a48d13eeb2548e542ad79
BLAKE2b-256 6bed7372a156161ebbc68a4ad72066d2bab49c55e8d9fa1c673d317b82ed14f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page