Skip to main content

A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage

Project description

TOON Format for Python

Tests Python Versions

⚠️ Beta Status (v0.9.x): This library is in active development and working towards spec compliance. Not yet published to PyPI. API may change before 1.0.0 release.

Compact, human-readable serialization format for LLM contexts with 30-60% token reduction vs JSON. Combines YAML-like indentation with CSV-like tabular arrays. Working towards full compatibility with the official TOON specification.

Key Features: Minimal syntax • Tabular arrays for uniform data • Array length validation • Python 3.8+ • Comprehensive test coverage.

# Not yet published to PyPI - install from source:
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync

# Or install directly from GitHub:
pip install git+https://github.com/toon-format/toon-python.git

Quick Start

from toon_format import encode, decode

# Simple object
encode({"name": "Alice", "age": 30})
# name: Alice
# age: 30

# Tabular array (uniform objects)
encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# [2,]{id,name}:
#   1,Alice
#   2,Bob

# Decode back to Python
decode("items[2]: apple,banana")
# {'items': ['apple', 'banana']}

CLI Usage

# Auto-detect format by extension
toon input.json -o output.toon      # Encode
toon data.toon -o output.json       # Decode
echo '{"x": 1}' | toon -            # Stdin/stdout

# Options
toon data.json --encode --delimiter "\t" --length-marker
toon data.toon --decode --no-strict --indent 4

Options: -e/--encode -d/--decode -o/--output --delimiter --indent --length-marker --no-strict

API Reference

encode(value, options=None)str

encode({"id": 123}, {"delimiter": "\t", "indent": 4, "lengthMarker": "#"})

Options:

  • delimiter: "," (default), "\t", "|"
  • indent: Spaces per level (default: 2)
  • lengthMarker: "" (default) or "#" to prefix array lengths

decode(input_str, options=None)Any

decode("id: 123", {"indent": 2, "strict": True})

Options:

  • indent: Expected indent size (default: 2)
  • strict: Validate syntax, lengths, delimiters (default: True)

Token Counting & Comparison

Measure token efficiency and compare formats:

from toon_format import estimate_savings, compare_formats, count_tokens

# Measure savings
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
result = estimate_savings(data)
print(f"Saves {result['savings_percent']:.1f}% tokens")  # Saves 42.3% tokens

# Visual comparison
print(compare_formats(data))
# Format Comparison
# ────────────────────────────────────────────────
# Format      Tokens    Size (chars)
# JSON            45             123
# TOON            28              85
# ────────────────────────────────────────────────
# Savings: 17 tokens (37.8%)

# Count tokens directly
toon_str = encode(data)
tokens = count_tokens(toon_str)  # Uses tiktoken (gpt5/gpt5-mini)

Requires tiktoken: uv add tiktoken (benchmark features are optional)

Format Specification

Type Example Input TOON Output
Object {"name": "Alice", "age": 30} name: Alice
age: 30
Primitive Array [1, 2, 3] [3]: 1,2,3
Tabular Array [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}] [2,]{id,name}:
  1,A
  2,B
Mixed Array [{"x": 1}, 42, "hi"] [3]:
  - x: 1
  - 42
  - hi

Quoting: Only when necessary (empty, keywords, numeric strings, whitespace, structural chars, delimiters)

Type Normalization: Infinity/NaN/FunctionsnullDecimalfloatdatetime → ISO 8601 • -00

Development

# Setup (requires uv: https://docs.astral.sh/uv/)
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync

# Run tests (792 tests, 91% coverage, 85% enforced)
uv run pytest --cov=toon_format --cov-report=term

# Code quality
uv run ruff check src/ tests/        # Lint
uv run ruff format src/ tests/       # Format
uv run mypy src/                     # Type check

CI/CD: GitHub Actions • Python 3.8-3.14 • Coverage enforcement • PR coverage comments

Project Status & Roadmap

Following semantic versioning towards 1.0.0:

  • v0.8.x - Initial code set, tests, documentation ✅
  • v0.9.x - Serializer improvements, spec compliance testing, publishing setup (current)
  • v1.0.0-rc.x - Release candidates for production readiness
  • v1.0.0 - First stable release with full spec compliance

See CONTRIBUTING.md for detailed guidelines.

Documentation

License

MIT License - see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toon_format-0.9.0b1.tar.gz (87.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toon_format-0.9.0b1-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file toon_format-0.9.0b1.tar.gz.

File metadata

  • Download URL: toon_format-0.9.0b1.tar.gz
  • Upload date:
  • Size: 87.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for toon_format-0.9.0b1.tar.gz
Algorithm Hash digest
SHA256 8f391dd6ad9677c78366bd8eb6762d064a2183f67b9b7da1f348fdb6ee8738e7
MD5 69082eb1ec0493e6015cd823bf5f666c
BLAKE2b-256 fa15d23d6d3e36aa4ec96dd5692bc7715fe17015b669e8f0d1c5c7fa906a3ceb

See more details on using hashes here.

Provenance

The following attestation bundles were made for toon_format-0.9.0b1.tar.gz:

Publisher: publish.yml on toon-format/toon-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file toon_format-0.9.0b1-py3-none-any.whl.

File metadata

  • Download URL: toon_format-0.9.0b1-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for toon_format-0.9.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 efeee919501f91137f017f7bed34789c067f76178111aa872a49bc8653dce3be
MD5 947092b286e268d827656711b67b0ac4
BLAKE2b-256 63f327ab1d982bb81bf9ac5be70b4c774996eb8562b93c77e93c253c22be951f

See more details on using hashes here.

Provenance

The following attestation bundles were made for toon_format-0.9.0b1-py3-none-any.whl:

Publisher: publish.yml on toon-format/toon-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page