Skip to main content

A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage

Project description

TOON Format for Python

Tests Python Versions

⚠️ Beta Status (v0.9.x): This library is in active development and working towards spec compliance. Beta published to PyPI. API may change before 1.0.0 release.

Compact, human-readable serialization format for LLM contexts with 30-60% token reduction vs JSON. Combines YAML-like indentation with CSV-like tabular arrays. Working towards full compatibility with the official TOON specification.

Key Features: Minimal syntax • Tabular arrays for uniform data • Array length validation • Python 3.8+ • Comprehensive test coverage.

# Beta published to PyPI - install from source:
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync

# Or install directly from GitHub:
pip install git+https://github.com/toon-format/toon-python.git

Quick Start

from toon_format import encode, decode

# Simple object
encode({"name": "Alice", "age": 30})
# name: Alice
# age: 30

# Tabular array (uniform objects)
encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# [2,]{id,name}:
#   1,Alice
#   2,Bob

# Decode back to Python
decode("items[2]: apple,banana")
# {'items': ['apple', 'banana']}

CLI Usage

# Auto-detect format by extension
toon input.json -o output.toon      # Encode
toon data.toon -o output.json       # Decode
echo '{"x": 1}' | toon -            # Stdin/stdout

# Options
toon data.json --encode --delimiter "\t" --length-marker
toon data.toon --decode --no-strict --indent 4

Options: -e/--encode -d/--decode -o/--output --delimiter --indent --length-marker --no-strict

API Reference

encode(value, options=None)str

encode({"id": 123}, {"delimiter": "\t", "indent": 4, "lengthMarker": "#"})

Options:

  • delimiter: "," (default), "\t", "|"
  • indent: Spaces per level (default: 2)
  • lengthMarker: "" (default) or "#" to prefix array lengths

decode(input_str, options=None)Any

decode("id: 123", {"indent": 2, "strict": True})

Options:

  • indent: Expected indent size (default: 2)
  • strict: Validate syntax, lengths, delimiters (default: True)

Token Counting & Comparison

Measure token efficiency and compare formats:

from toon_format import estimate_savings, compare_formats, count_tokens

# Measure savings
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
result = estimate_savings(data)
print(f"Saves {result['savings_percent']:.1f}% tokens")  # Saves 42.3% tokens

# Visual comparison
print(compare_formats(data))
# Format Comparison
# ────────────────────────────────────────────────
# Format      Tokens    Size (chars)
# JSON            45             123
# TOON            28              85
# ────────────────────────────────────────────────
# Savings: 17 tokens (37.8%)

# Count tokens directly
toon_str = encode(data)
tokens = count_tokens(toon_str)  # Uses tiktoken (gpt5/gpt5-mini)

Requires tiktoken: uv add tiktoken (benchmark features are optional)

Format Specification

Type Example Input TOON Output
Object {"name": "Alice", "age": 30} name: Alice
age: 30
Primitive Array [1, 2, 3] [3]: 1,2,3
Tabular Array [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}] [2,]{id,name}:
  1,A
  2,B
Mixed Array [{"x": 1}, 42, "hi"] [3]:
  - x: 1
  - 42
  - hi

Quoting: Only when necessary (empty, keywords, numeric strings, whitespace, structural chars, delimiters)

Type Normalization: Infinity/NaN/FunctionsnullDecimalfloatdatetime → ISO 8601 • -00

Development

# Setup (requires uv: https://docs.astral.sh/uv/)
git clone https://github.com/toon-format/toon-python.git
cd toon-python
uv sync

# Run tests (792 tests, 91% coverage, 85% enforced)
uv run pytest --cov=toon_format --cov-report=term

# Code quality
uv run ruff check src/ tests/        # Lint
uv run ruff format src/ tests/       # Format
uv run mypy src/                     # Type check

CI/CD: GitHub Actions • Python 3.8-3.14 • Coverage enforcement • PR coverage comments

Project Status & Roadmap

Following semantic versioning towards 1.0.0:

  • v0.8.x - Initial code set, tests, documentation ✅
  • v0.9.x - Serializer improvements, spec compliance testing, publishing setup (current)
  • v1.0.0-rc.x - Release candidates for production readiness
  • v1.0.0 - First stable release with full spec compliance

See CONTRIBUTING.md for detailed guidelines.

Documentation

Contributors

License

MIT License – see LICENSE for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flow_toon_format-0.9.0b2.tar.gz (84.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flow_toon_format-0.9.0b2-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file flow_toon_format-0.9.0b2.tar.gz.

File metadata

  • Download URL: flow_toon_format-0.9.0b2.tar.gz
  • Upload date:
  • Size: 84.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for flow_toon_format-0.9.0b2.tar.gz
Algorithm Hash digest
SHA256 2a7370bc69a2aa6d44fbfc9f1a256f5cc92cb4bf857f7aa81c4ed4b0ce0054be
MD5 fce94e521ddc4347c48e55d349f9352a
BLAKE2b-256 be7168a3f26cd06cb0700c8cb4e0f143d179bb8a00f70ccd6a1867713ed296db

See more details on using hashes here.

Provenance

The following attestation bundles were made for flow_toon_format-0.9.0b2.tar.gz:

Publisher: publish.yml on gouveiahenrique/toon-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file flow_toon_format-0.9.0b2-py3-none-any.whl.

File metadata

File hashes

Hashes for flow_toon_format-0.9.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 0673694547ce32260325a1f346a407df41c59ddfea0afe75f2f6690c85c3c011
MD5 af76f4dc76377d0e24e37e34c4992816
BLAKE2b-256 04f0e8e0d034c6ca51cdcb15f86299aab2d691f71c7552c5cfc00e40295aa8e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for flow_toon_format-0.9.0b2-py3-none-any.whl:

Publisher: publish.yml on gouveiahenrique/toon-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page