Skip to main content

Python implementation of TOON (Token-Oriented Object Notation) - a token-efficient JSON alternative for LLM prompts

Project description

TOON Python Implementation

PyPI version

A Python implementation of TOON (Token-Oriented Object Notation) - a token-efficient JSON alternative for LLM prompts, following the original @byjohann/toon TypeScript/JavaScript implementation.

This implementation adheres to the TOON v1.1 specification and maintains compatibility with the original implementation.

Features

  • Token-efficient: 30-60% fewer tokens than JSON
  • LLM-friendly: Explicit lengths and field lists help models validate output
  • Minimal syntax: Removes redundant punctuation (braces, brackets, most quotes)
  • Indentation-based: Uses whitespace for structure (like YAML)
  • Tabular arrays: Declare keys once, stream rows without repetition
  • Spec-compliant: Follows TOON v1.1 specification
  • Compatible: Works with original TypeScript implementation

Installation

uv add pytoon-core

or

pip install pytoon-core

Or install from source:

git clone https://github.com/Alg0rix/toon-py.git
cd toon-py
uv sync

Quick Start

from toon_py import encode, decode

# Encode Python values to TOON
data = {
    "name": "Ada",
    "age": 30,
    "active": True,
    "tags": ["admin", "developer", "python"]
}

toon_str = encode(data)
print(toon_str)
# Output:
# name: Ada
# age: 30
# active: true
# tags[3]: admin,developer,python

# Decode TOON back to Python
decoded = decode(toon_str)
print(decoded)
# Output:
# {'name': 'Ada', 'age': 30, 'active': True, 'tags': ['admin', 'developer', 'python']}

Advanced Usage

Tabular Arrays (Most Efficient)

When you have arrays of objects with the same structure, TOON uses an efficient tabular format:

data = {
    "users": [
        {"id": 1, "name": "Ada", "role": "admin"},
        {"id": 2, "name": "Bob", "role": "user"},
        {"id": 3, "name": "Charlie", "role": "user"}
    ]
}

toon_str = encode(data)
print(toon_str)
# Output:
# users[3]{id,name,role}:
#   1,Ada,admin
#   2,Bob,user
#   3,Charlie,user

Alternative Delimiters

Use tabs or pipes for even better token efficiency:

# Tab delimiter
toon_str = encode(data, delimiter='tab')
print(toon_str)
# Output:
# users[3 ]{id name role}:
#   1 Ada admin
#   2 Bob user
#   3 Charlie user

# Pipe delimiter
toon_str = encode(data, delimiter='pipe')
print(toon_str)
# Output:
# users[3|]{id|name|role}:
#   1|Ada|admin
#   2|Bob|user
#   3|Charlie|user

Nested Structures

TOON handles nested objects and arrays naturally:

data = {
    "company": "Tech Corp",
    "employees": [
        {
            "name": "Ada",
            "contact": {"email": "ada@tech.com", "phone": "555-0101"},
            "skills": ["Python", "ML", "Data Science"]
        },
        {
            "name": "Bob",
            "contact": {"email": "bob@tech.com", "phone": "555-0102"},
            "skills": ["JavaScript", "React", "Node.js"]
        }
    ]
}

toon_str = encode(data)
print(toon_str)
# Output:
# company: Tech Corp
# employees[2]:
#   - name: Ada
#     contact:
#       email: ada@tech.com
#       phone: 555-0101
#     skills[3]: Python,ML,Data Science
#   - name: Bob
#     contact:
#       email: bob@tech.com
#       phone: 555-0102
#     skills[3]: JavaScript,React,Node.js

Encoding Options

encode(data, options={
    'indent': 2,           # Spaces per indent level (default: 2)
    'delimiter': ',',      # 'comma', 'tab', 'pipe', or actual char (default: ',')
    'length_marker': False  # Add '#' prefix to array lengths (default: False)
})

Decoding Options

decode(toon_str, options={
    'indent': 2,   # Expected indent size (default: 2)
    'strict': True # Strict validation (default: True)
})

Format Comparison

JSON (Verbose)

{
  "users": [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob", "role": "user"}
  ]
}

TOON (Compact)

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Token savings: ~40-50% fewer tokens than JSON!

API Reference

encode(value, options=None)

Encode a Python value to TOON format.

Parameters:

  • value: Any JSON-serializable value (dict, list, or primitive)
  • options (optional): Encoding options dict
    • indent (int): Spaces per indentation level (default: 2)
    • delimiter (str): Array delimiter - 'comma', 'tab', 'pipe', or the char (default: ',')
    • length_marker (bool): Prefix array lengths with '#' (default: False)

Returns: TOON-formatted string

decode(text, options=None)

Decode TOON text to a Python value.

Parameters:

  • text: TOON-formatted string
  • options (optional): Decoding options dict
    • indent (int): Expected indent size (default: 2)
    • strict (bool): Enable strict validation (default: True)

Returns: Python value (dict, list, or primitive)

Raises:

  • ValueError: If input is malformed or validation fails (in strict mode)

Type Handling

The encoder automatically handles Python-specific types:

Python Type TOON Output
int, float Number (normalized, no scientific notation)
bool true/false
None null
str String (quoted if needed)
datetime ISO 8601 string
set Array
dict Object
list, tuple Array
float('nan'), float('inf') null
Decimal String (if outside safe integer range)

Whitespace Rules

TOON follows strict whitespace invariants:

  • No trailing spaces on any line
  • No trailing newline at end of document
  • One space after : in key-value pairs
  • Consistent indentation (configurable, default 2 spaces)

Why TOON?

TOON is designed for passing structured data to LLMs with minimal token usage. While JSON is a great general-purpose format, it's verbose and token-expensive when used with LLMs. TOON solves this by:

  1. Removing redundant syntax - No quotes on unquoted strings, no braces for objects
  2. Using tabular format - For arrays of uniform objects, declare fields once
  3. Explicit lengths - Help LLMs track array bounds
  4. Deterministic formatting - Always produces the same output for the same input

Compatibility

This Python implementation is designed to be compatible with the original TypeScript/JavaScript implementation. TOON documents encoded with this library can be decoded by the original library, and vice versa.

Specification

This implementation follows the TOON v1.1 Specification, which defines:

  • Data model (JSON-compatible)
  • Encoding normalization rules
  • Concrete syntax
  • Decoding semantics
  • Conformance requirements

Examples

See the examples/ directory for comprehensive usage examples:

Testing

The project includes a comprehensive test suite covering all TOON v1.1 specification features:

# Run all tests
uv run pytest tests/

# Run tests with coverage
uv run pytest tests/ --cov=toon_py --cov-report=html

# Run specific test categories
uv run pytest tests/test_basic_encoding.py     # Core functionality
uv run pytest tests/test_tabular_arrays.py     # Tabular format
uv run pytest tests/test_advanced_features.py  # Advanced features
uv run pytest tests/test_normalization.py      # Data normalization

Test Coverage

  • Core TOON Features - Primitives, objects, arrays, nesting
  • Tabular Optimization - Uniform object arrays
  • Advanced Features - Alternative delimiters, length markers
  • Edge Cases - Unicode, special characters, large data
  • Normalization - Python type handling
  • Compliance - TOON v1.1 specification
  • Compatibility - Reference implementation compatibility

Benchmarks

TOON typically achieves 30-60% token reduction compared to JSON, depending on the data structure. See the original benchmarks for detailed comparisons.

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please read the contributing guidelines and ensure all tests pass.

Acknowledgments

  • Original implementation by Johann Schopplich
  • Specification based on TOON v1.1
  • Python port following the original TypeScript/JavaScript implementation

Note: TOON is designed for LLM input (passing data to models), not as a general-purpose serialization format like JSON. For APIs, databases, and other applications, JSON is still the better choice.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytoon_core-0.1.1.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytoon_core-0.1.1-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file pytoon_core-0.1.1.tar.gz.

File metadata

  • Download URL: pytoon_core-0.1.1.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for pytoon_core-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8a070cc530a9b17458b9e4b32b1f023970f9a760762e16dc693fa7120285b277
MD5 a328bb40b4c1b99eff924e1d5498c2ee
BLAKE2b-256 afa65e34489b3908f87d9e025fed59c7328b4b983462de7bcbc2cd0d57005904

See more details on using hashes here.

File details

Details for the file pytoon_core-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pytoon_core-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 929e39637d7ad78b970508e764611b0d4b43bc6088d2b15635c8c56184bf844f
MD5 3d12a4b8fa1b473a16f8a04b55cbe9f6
BLAKE2b-256 b4460c986a2f7bf20f32593917a83f3126843574271707275c42f1b43d6427aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page