Python implementation of TOON (Token-Oriented Object Notation) - a token-efficient JSON alternative for LLM prompts

Project description

TOON Python Implementation

A Python implementation of TOON (Token-Oriented Object Notation) - a token-efficient JSON alternative for LLM prompts, following the original @byjohann/toon TypeScript/JavaScript implementation.

This implementation adheres to the TOON v1.1 specification and maintains compatibility with the original implementation.

Features

✅ Token-efficient: 30-60% fewer tokens than JSON
✅ LLM-friendly: Explicit lengths and field lists help models validate output
✅ Minimal syntax: Removes redundant punctuation (braces, brackets, most quotes)
✅ Indentation-based: Uses whitespace for structure (like YAML)
✅ Tabular arrays: Declare keys once, stream rows without repetition
✅ Spec-compliant: Follows TOON v1.1 specification
✅ Compatible: Works with original TypeScript implementation

Installation

uv add pytoon-core

pip install pytoon-core

Or install from source:

git clone https://github.com/Alg0rix/toon-py.git
cd toon-py
uv sync

Quick Start

from toon_py import encode, decode

# Encode Python values to TOON
data = {
    "name": "Ada",
    "age": 30,
    "active": True,
    "tags": ["admin", "developer", "python"]
}

toon_str = encode(data)
print(toon_str)
# Output:
# name: Ada
# age: 30
# active: true
# tags[3]: admin,developer,python

# Decode TOON back to Python
decoded = decode(toon_str)
print(decoded)
# Output:
# {'name': 'Ada', 'age': 30, 'active': True, 'tags': ['admin', 'developer', 'python']}

Advanced Usage

Tabular Arrays (Most Efficient)

When you have arrays of objects with the same structure, TOON uses an efficient tabular format:

data = {
    "users": [
        {"id": 1, "name": "Ada", "role": "admin"},
        {"id": 2, "name": "Bob", "role": "user"},
        {"id": 3, "name": "Charlie", "role": "user"}
    ]
}

toon_str = encode(data)
print(toon_str)
# Output:
# users[3]{id,name,role}:
#   1,Ada,admin
#   2,Bob,user
#   3,Charlie,user

Alternative Delimiters

Use tabs or pipes for even better token efficiency:

# Tab delimiter
toon_str = encode(data, delimiter='tab')
print(toon_str)
# Output:
# users[3 ]{id name role}:
#   1 Ada admin
#   2 Bob user
#   3 Charlie user

# Pipe delimiter
toon_str = encode(data, delimiter='pipe')
print(toon_str)
# Output:
# users[3|]{id|name|role}:
#   1|Ada|admin
#   2|Bob|user
#   3|Charlie|user

Nested Structures

TOON handles nested objects and arrays naturally:

data = {
    "company": "Tech Corp",
    "employees": [
        {
            "name": "Ada",
            "contact": {"email": "ada@tech.com", "phone": "555-0101"},
            "skills": ["Python", "ML", "Data Science"]
        },
        {
            "name": "Bob",
            "contact": {"email": "bob@tech.com", "phone": "555-0102"},
            "skills": ["JavaScript", "React", "Node.js"]
        }
    ]
}

toon_str = encode(data)
print(toon_str)
# Output:
# company: Tech Corp
# employees[2]:
#   - name: Ada
#     contact:
#       email: ada@tech.com
#       phone: 555-0101
#     skills[3]: Python,ML,Data Science
#   - name: Bob
#     contact:
#       email: bob@tech.com
#       phone: 555-0102
#     skills[3]: JavaScript,React,Node.js

Encoding Options

encode(data, options={
    'indent': 2,           # Spaces per indent level (default: 2)
    'delimiter': ',',      # 'comma', 'tab', 'pipe', or actual char (default: ',')
    'length_marker': False  # Add '#' prefix to array lengths (default: False)
})

Decoding Options

decode(toon_str, options={
    'indent': 2,   # Expected indent size (default: 2)
    'strict': True # Strict validation (default: True)
})

Format Comparison

JSON (Verbose)

{
  "users": [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob", "role": "user"}
  ]
}

TOON (Compact)

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

Token savings: ~40-50% fewer tokens than JSON!

API Reference

`encode(value, options=None)`

Encode a Python value to TOON format.

Parameters:

value: Any JSON-serializable value (dict, list, or primitive)
options (optional): Encoding options dict
- indent (int): Spaces per indentation level (default: 2)
- delimiter (str): Array delimiter - 'comma', 'tab', 'pipe', or the char (default: ',')
- length_marker (bool): Prefix array lengths with '#' (default: False)

Returns: TOON-formatted string

`decode(text, options=None)`

Decode TOON text to a Python value.

Parameters:

text: TOON-formatted string
options (optional): Decoding options dict
- indent (int): Expected indent size (default: 2)
- strict (bool): Enable strict validation (default: True)

Returns: Python value (dict, list, or primitive)

Raises:

ValueError: If input is malformed or validation fails (in strict mode)

Type Handling

The encoder automatically handles Python-specific types:

Python Type	TOON Output
`int`, `float`	Number (normalized, no scientific notation)
`bool`	`true`/`false`
`None`	`null`
`str`	String (quoted if needed)
`datetime`	ISO 8601 string
`set`	Array
`dict`	Object
`list`, `tuple`	Array
`float('nan')`, `float('inf')`	`null`
`Decimal`	String (if outside safe integer range)

Whitespace Rules

TOON follows strict whitespace invariants:

No trailing spaces on any line
No trailing newline at end of document
One space after : in key-value pairs
Consistent indentation (configurable, default 2 spaces)

Why TOON?

TOON is designed for passing structured data to LLMs with minimal token usage. While JSON is a great general-purpose format, it's verbose and token-expensive when used with LLMs. TOON solves this by:

Removing redundant syntax - No quotes on unquoted strings, no braces for objects
Using tabular format - For arrays of uniform objects, declare fields once
Explicit lengths - Help LLMs track array bounds
Deterministic formatting - Always produces the same output for the same input

Compatibility

This Python implementation is designed to be compatible with the original TypeScript/JavaScript implementation. TOON documents encoded with this library can be decoded by the original library, and vice versa.

Specification

This implementation follows the TOON v1.1 Specification, which defines:

Data model (JSON-compatible)
Encoding normalization rules
Concrete syntax
Decoding semantics
Conformance requirements

Examples

See the examples/ directory for comprehensive usage examples:

basic_usage.py - Core functionality demonstration
advanced_features.py - Advanced TOON features
llm_integration.py - LLM integration scenarios
performance_comparison.py - Performance benchmarking

Testing

The project includes a comprehensive test suite covering all TOON v1.1 specification features:

# Run all tests
uv run pytest tests/

# Run tests with coverage
uv run pytest tests/ --cov=toon_py --cov-report=html

# Run specific test categories
uv run pytest tests/test_basic_encoding.py     # Core functionality
uv run pytest tests/test_tabular_arrays.py     # Tabular format
uv run pytest tests/test_advanced_features.py  # Advanced features
uv run pytest tests/test_normalization.py      # Data normalization

Test Coverage

✅ Core TOON Features - Primitives, objects, arrays, nesting
✅ Tabular Optimization - Uniform object arrays
✅ Advanced Features - Alternative delimiters, length markers
✅ Edge Cases - Unicode, special characters, large data
✅ Normalization - Python type handling
✅ Compliance - TOON v1.1 specification
✅ Compatibility - Reference implementation compatibility

Benchmarks

TOON typically achieves 30-60% token reduction compared to JSON, depending on the data structure. See the original benchmarks for detailed comparisons.

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please read the contributing guidelines and ensure all tests pass.

Acknowledgments

Original implementation by Johann Schopplich
Specification based on TOON v1.1
Python port following the original TypeScript/JavaScript implementation

Note: TOON is designed for LLM input (passing data to models), not as a general-purpose serialization format like JSON. For APIs, databases, and other applications, JSON is still the better choice.

Project details

Release history Release notifications | RSS feed

0.1.1

Oct 30, 2025

This version

0.1.0

Oct 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytoon_core-0.1.0.tar.gz (43.4 kB view details)

Uploaded Oct 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytoon_core-0.1.0-py3-none-any.whl (23.1 kB view details)

Uploaded Oct 29, 2025 Python 3

File details

Details for the file pytoon_core-0.1.0.tar.gz.

File metadata

Download URL: pytoon_core-0.1.0.tar.gz
Upload date: Oct 29, 2025
Size: 43.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.5

File hashes

Hashes for pytoon_core-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0c3c5fe198d17da1ac6a93fb8658e772c1be382371133c063a5f993402394f62`
MD5	`a89bb90aad0289d186d63860ba40fca7`
BLAKE2b-256	`2acae40b00853d064e32756a4a8db15b982c4b1050a2655c006ffb7379e866e5`

See more details on using hashes here.

File details

Details for the file pytoon_core-0.1.0-py3-none-any.whl.

File metadata

Download URL: pytoon_core-0.1.0-py3-none-any.whl
Upload date: Oct 29, 2025
Size: 23.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.5

File hashes

Hashes for pytoon_core-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`701278770315f8903a151cab0123ab351d57984d603722b601961ad3c527edf4`
MD5	`6663e57a20895ca5d3807409822153cf`
BLAKE2b-256	`c4541d125800882055daafdd711211c27fb4e1182d6bb14e5c9698b0433b15b0`

See more details on using hashes here.

pytoon-core 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TOON Python Implementation

Features

Installation

Quick Start

Advanced Usage

Tabular Arrays (Most Efficient)

Alternative Delimiters

Nested Structures

Encoding Options

Decoding Options

Format Comparison

JSON (Verbose)

TOON (Compact)

API Reference

encode(value, options=None)

decode(text, options=None)

Type Handling

Whitespace Rules

Why TOON?

Compatibility

Specification

Examples

Testing

Test Coverage

Benchmarks

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`encode(value, options=None)`

`decode(text, options=None)`