Smart structured-data-to-TOON gateway with pragmatic auto-gating for LLM prompts.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

andrii-su

These details have not been verified by PyPI

Project description

datoon

smart structured-data→TOON gateway — converts only when it actually saves tokens

Before/After • Install • What You Get • How It Works • Benchmarks • Full install guide

Raw structured data is often verbose in LLM prompts. TOON can save tokens — but blind conversion can also make payloads worse. datoon adds a decision layer: convert when structure and savings justify it, skip when they don't, and always explain why.

Supports JSON, CSV, JSONL, YAML, XML, Parquet, Avro, ORC, Excel, and Apple Numbers — auto-detected from file extension.

Before / After

JSON in the prompt (43 tokens)

{"users":[
  {"id":1,"name":"Ada","role":"admin"},
  {"id":2,"name":"Lin","role":"analyst"},
  {"id":3,"name":"Grace","role":"viewer"}
]}

datoon converts → TOON (24 tokens)

users[3]{id,name,role}:
  1,Ada,admin
  2,Lin,analyst
  3,Grace,viewer

{"decision":"convert","reason":"Estimated savings 44.19% (threshold 15.00%)."}

CSV from a data pipeline (111 tokens as JSON)

id,name,role
1,Ada,admin
2,Lin,analyst
3,Grace,viewer

datoon auto-converts → TOON (24 tokens)

datoon data.csv --report-stdout

Same result. Zero JSON serialization in your code.

Non-uniform payload (26 tokens)

{"config":{"debug":true},"tags":["a","b"]}

datoon skips → keeps JSON

{"decision":"skip","reason":"No uniform object arrays found with at least 3 rows."}

No Node.js call. No silent corruption.

Same data. Right format. Always explained.

┌──────────────────────────────────────────────────┐
│  PAYLOAD SAVINGS (auto avg)    ████░░░░░░   28%  │
│  PAYLOAD SAVINGS (agent skill) ████████░░   62%  │
│  DECISION ACCURACY             ██████████  100%  │
│  HARMFUL CONVERSIONS BLOCKED   ██████████  100%  │
└──────────────────────────────────────────────────┘

[!IMPORTANT] datoon saves payload tokens — the structured data portion of your prompt. Token savings depend on payload shape: uniform tabular data converts well; deeply nested or non-uniform structures are skipped. Every decision includes a reason so pipelines can log, debug, and trust the outcome.

Install

# core (JSON, CSV, JSONL, XML — no extra deps)
uv add datoon
pip install datoon

# with YAML support
pip install "datoon[yaml]"

# with Excel support
pip install "datoon[excel]"

# with Parquet / ORC / Avro support
pip install "datoon[columnar]"

# with Apple Numbers support
pip install "datoon[numbers]"

# with tiktoken-based token counting
pip install "datoon[tokens]"

# with MCP server
pip install "datoon[mcp]"

# everything
pip install "datoon[all]"

Requires Python 3.12+. TOON conversion requires Node.js with npx in PATH — analysis and format reading work without it.

For Claude Code plugin, Codex, and MCP config → INSTALL.md.

What You Get

	What
`datoon` CLI	Auto-gate any supported format → TOON from terminal or scripts
Python API	`convert_json_for_llm()` + `read_tabular()` for any LLM pipeline
MCP Server	`convert_json`, `convert_text`, `analyze_json` tools for Claude Desktop, Cursor, Windsurf
Claude Code Plugin	`/datoon` in-session trigger, installs from GitHub in one command
Codex Plugin	Marketplace plugin — structured-data mode for Codex

Supported input formats

Format	Extension	Extra needed
JSON	`.json`	—
JSONL	`.jsonl`, `.ndjson`	—
CSV	`.csv`	—
XML	`.xml`	—
YAML	`.yaml`, `.yml`	`datoon[yaml]`
Excel	`.xlsx`, `.xls`	`datoon[excel]`
Parquet	`.parquet`	`datoon[columnar]`
Avro	`.avro`	`datoon[columnar]`
ORC	`.orc`	`datoon[columnar]`
Apple Numbers	`.numbers`	`datoon[numbers]`

How It Works

Detect format — from --format flag, file extension, or default to JSON for stdin
Read + normalize — parse source into list of row dicts; serialize to compact JSON
Analyze structure — uniform object arrays? acceptable depth? minimum rows?
Gate early — non-candidates skip before any CLI call; no Node.js overhead
Convert + estimate — TOON CLI runs, token savings calculated
Gate savings — below threshold → return JSON; above → return TOON with report

Every path returns a ConversionReport with decision, reason, and token estimates. Pipelines never get silent surprises.

Quick Start

JSON (stdin):

echo '{"users":[{"id":1,"name":"Ada"},{"id":2,"name":"Lin"},{"id":3,"name":"Grace"}]}' | datoon --report-stdout

CSV (auto-detected from extension):

datoon data.csv --report-stdout

JSONL:

datoon data.jsonl -o output.toon

YAML (requires datoon[yaml]):

datoon data.yaml --report-stdout

Parquet (requires datoon[columnar]):

datoon data.parquet --report ./report.json

Explicit format override:

datoon --format csv < data.csv --report-stdout

Force conversion (bypass gating — for experiments):

datoon data.json --force --report-stdout

Python API

JSON conversion:

from datoon import convert_json_for_llm, ConversionConfig, DatoonError

config = ConversionConfig(min_savings_ratio=0.15, max_depth=6, min_uniform_rows=3)

try:
    outcome = convert_json_for_llm(raw_json, config)
except DatoonError as exc:
    raise

# outcome.payload_text  — TOON or original JSON
# outcome.report.decision  — "convert" | "skip"
# outcome.report.reason    — human-readable explanation
send_to_model(outcome.payload_text)

Any format via read_tabular:

import json
from pathlib import Path
from datoon import read_tabular, convert_json_for_llm, ConversionConfig

# text formats: csv, jsonl, yaml, xml
rows = read_tabular("csv", text=csv_string)

# binary formats: excel, parquet, orc, avro, numbers
rows = read_tabular("parquet", path=Path("data.parquet"))

json_text = json.dumps(rows, separators=(",", ":"))
outcome = convert_json_for_llm(json_text, ConversionConfig())
send_to_model(outcome.payload_text)

Structure-only analysis (no Node.js required):

from datoon.analyzer import analyze_payload
from datoon.models import ConversionConfig

analysis = analyze_payload(parsed_data, ConversionConfig())
print(analysis.is_candidate, analysis.reason)

MCP Server

datoon ships an MCP server with three tools:

Tool	Description
`convert_json`	Full JSON conversion with policy gating
`convert_text`	Converts CSV, YAML, XML, or JSONL text with policy gating
`analyze_json`	Structure analysis only — no Node.js needed

Claude Desktop / Cursor / Windsurf config:

{
  "mcpServers": {
    "datoon": {
      "command": "uvx",
      "args": ["--from", "datoon[mcp]", "datoon", "mcp"]
    }
  }
}

Run locally:

datoon mcp     # or the standalone script: datoon-mcp

Listed on the MCP Registry, Smithery, and Glama. See MARKETPLACES.md.

Claude Code Plugin

Install directly from GitHub:

claude plugin marketplace add andrii-su/datoon
claude plugin install datoon@datoon

Trigger in-session:

/datoon
convert this JSON to TOON if it saves tokens
use datoon mode for structured data

CLI Reference

Flag	Default	Description
`--format`	auto	Input format: `json`, `csv`, `jsonl`, `yaml`, `xml`, `excel`, `parquet`, `avro`, `orc`, `numbers`
`--force`	`false`	Bypass gating and minimum savings threshold
`--min-savings`	`0.15`	Minimum relative token savings required
`--max-depth`	`6`	Maximum nesting depth for auto-conversion
`--min-uniform-rows`	`3`	Minimum rows in uniform object arrays
`--timeout`	`30`	Seconds before TOON CLI call is aborted
`--report <path>`	—	Write JSON conversion report to file
`--report-stdout`	—	Print JSON conversion report to stderr
`-o <path>`	stdout	Output file path
`--version`	—	Print version and exit

Format is auto-detected from file extension. Use --format to override or when reading from stdin.

Benchmarks

PYTHONPATH=src python benchmarks/run.py --dry-run
PYTHONPATH=src python benchmarks/run.py
PYTHONPATH=src python benchmarks/run.py --update-readme

Why auto mode outperforms forced conversion

Auto mode avoids low-benefit and high-risk payloads (orders-nested, mixed-non-uniform) while matching forced TOON's average token count on suitable ones. Every decision comes with a reasoned report.

Scenario	JSON Baseline	Forced TOON	`datoon` Auto
Average tokens	77	50	50
Avg token saved	0.0%	26.8%	28.1%
Decision quality	n/a	Converts all	Converts `3/5`, skips harmful cases

Dataset	JSON	TOON (forced)	Raw Saved	Auto	Auto Tokens	Auto Saved
users-small	56	31	44.6%	convert	31	44.6%
events-medium	198	111	43.9%	convert	111	43.9%
orders-nested	93	91	2.2%	skip	93	0.0%
mixed-non-uniform	35	37	-5.7%	skip	35	0.0%
metrics-wide	133	63	52.6%	convert	63	52.6%
Average	103	67	27.5%	3/5 convert	67	28.2%

Forced conversion succeeded for 5/5 payloads.

Format conversion benchmark

Token savings when converting from common structured formats (CSV, JSONL, XML, YAML). Baseline is the JSON representation of the same data — what an LLM would receive without datoon.

Dataset	Format	JSON Tokens	TOON (forced)	Auto	Auto Tokens	Auto Saved
users-csv	csv	53	29	convert	29	45.3%
events-jsonl	jsonl	194	109	convert	109	43.8%
catalog-xml	xml	96	50	convert	50	47.9%
metrics-yaml	yaml	129	61	convert	61	52.7%
Average	—	118	62	4/4 convert	62	47.4%

Forced conversion succeeded for 4/4 payloads.

Agent skill evaluation

Artifact-based subagent comparison — identical analysis tasks, two modes:

with_skill: agent received the datoon skill and followed the conversion workflow.
without_skill: agent used JSON directly, no TOON or datoon.

3 payload sizes × 3 iterations = 18 total agent runs. Both modes: 100% correct answers.

Scenario	Avg JSON Tokens	Avg TOON Tokens	Avg Payload Saved
small	225	118	47.6%
medium	2,972	1,138	61.7%
large	17,757	6,673	62.4%

Full report and raw outputs: benchmarks/agent_skill_eval/. Savings are payload-token estimates, not full end-to-end model-token usage.

Development

Contributor workflow: CONTRIBUTING.md. Maintainer/agent notes: CLAUDE.md.

Setup:

uv sync --extra dev
uvx pre-commit install

Tests:

pytest -m "not integration"   # unit only (102 tests)
pytest                        # with integration (requires Node.js + npx)

Skill sync + plugin metadata:

python scripts/validate_skill_sync.py
python scripts/validate_plugin_metadata.py

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

andrii-su

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.7.1

Jun 22, 2026

1.7.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datoon-1.7.1.tar.gz (193.1 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datoon-1.7.1-py3-none-any.whl (24.0 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file datoon-1.7.1.tar.gz.

File metadata

Download URL: datoon-1.7.1.tar.gz
Upload date: Jun 22, 2026
Size: 193.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datoon-1.7.1.tar.gz
Algorithm	Hash digest
SHA256	`f095eee4c972129cf769bbe2a6a34446410b5a267b2932aaa711fff0d752aba0`
MD5	`5ac5c9b7d3712e4dc98ae2436fde8386`
BLAKE2b-256	`db5ec6aab1b29e82779b7981584bd7f9fa9c60f1c48a53e56738ecef150e475b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datoon-1.7.1.tar.gz:

Publisher: publish.yml on andrii-su/datoon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datoon-1.7.1.tar.gz
- Subject digest: f095eee4c972129cf769bbe2a6a34446410b5a267b2932aaa711fff0d752aba0
- Sigstore transparency entry: 1915763699
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: andrii-su/datoon@7a983c8dcb2664bc8656790578bfc4a48a7f47c3
- Branch / Tag: refs/heads/main
- Owner: https://github.com/andrii-su
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7a983c8dcb2664bc8656790578bfc4a48a7f47c3
- Trigger Event: workflow_run

File details

Details for the file datoon-1.7.1-py3-none-any.whl.

File metadata

Download URL: datoon-1.7.1-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 24.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datoon-1.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`93f682e97059486dc06a686458b850569418e1f4f9c8d795af33e071f198dd86`
MD5	`250f4d8827f8ee52970b55e536502cc1`
BLAKE2b-256	`23ee095224737ca85c357d4fcc4720d57c4a71b226714a67e91bfaf3ac0a37bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datoon-1.7.1-py3-none-any.whl:

Publisher: publish.yml on andrii-su/datoon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datoon-1.7.1-py3-none-any.whl
- Subject digest: 93f682e97059486dc06a686458b850569418e1f4f9c8d795af33e071f198dd86
- Sigstore transparency entry: 1915763811
- Sigstore integration time: Jun 22, 2026
Source repository:
- Permalink: andrii-su/datoon@7a983c8dcb2664bc8656790578bfc4a48a7f47c3
- Branch / Tag: refs/heads/main
- Owner: https://github.com/andrii-su
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7a983c8dcb2664bc8656790578bfc4a48a7f47c3
- Trigger Event: workflow_run

datoon 1.7.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

datoon

Before / After

JSON in the prompt (43 tokens)

datoon converts → TOON (24 tokens)

CSV from a data pipeline (111 tokens as JSON)

datoon auto-converts → TOON (24 tokens)

Non-uniform payload (26 tokens)

datoon skips → keeps JSON

Install

What You Get

Supported input formats

How It Works

Quick Start

Python API

MCP Server

Claude Code Plugin

CLI Reference

Benchmarks

Why auto mode outperforms forced conversion

Format conversion benchmark

Agent skill evaluation

Development

Links

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance