Smart structured-data-to-TOON gateway with pragmatic auto-gating for LLM prompts.
Project description
datoon
smart structured-data→TOON gateway — converts only when it actually saves tokens
Before/After • Install • What You Get • How It Works • Benchmarks • Full install guide
Raw structured data is often verbose in LLM prompts. TOON can save tokens — but blind conversion can also make payloads worse. datoon adds a decision layer: convert when structure and savings justify it, skip when they don't, and always explain why.
Supports JSON, CSV, JSONL, YAML, XML, Parquet, Avro, ORC, Excel, and Apple Numbers — auto-detected from file extension.
Before / After
Same data. Right format. Always explained.
┌──────────────────────────────────────────────────┐
│ PAYLOAD SAVINGS (auto avg) ████░░░░░░ 28% │
│ PAYLOAD SAVINGS (agent skill) ████████░░ 62% │
│ DECISION ACCURACY ██████████ 100% │
│ HARMFUL CONVERSIONS BLOCKED ██████████ 100% │
└──────────────────────────────────────────────────┘
[!IMPORTANT] datoon saves payload tokens — the structured data portion of your prompt. Token savings depend on payload shape: uniform tabular data converts well; deeply nested or non-uniform structures are skipped. Every decision includes a reason so pipelines can log, debug, and trust the outcome.
Install
# core (JSON, CSV, JSONL, XML — no extra deps)
uv add datoon
pip install datoon
# with YAML support
pip install "datoon[yaml]"
# with Excel support
pip install "datoon[excel]"
# with Parquet / ORC / Avro support
pip install "datoon[columnar]"
# with Apple Numbers support
pip install "datoon[numbers]"
# with tiktoken-based token counting
pip install "datoon[tokens]"
# with MCP server
pip install "datoon[mcp]"
# everything
pip install "datoon[all]"
Requires Python 3.12+. TOON conversion requires Node.js with npx in PATH — analysis and format reading work without it.
For Claude Code plugin, Codex, and MCP config → INSTALL.md.
What You Get
| What | |
|---|---|
datoon CLI |
Auto-gate any supported format → TOON from terminal or scripts |
| Python API | convert_json_for_llm() + read_tabular() for any LLM pipeline |
| MCP Server | convert_json, convert_text, analyze_json tools for Claude Desktop, Cursor, Windsurf |
| Claude Code Plugin | /datoon in-session trigger, installs from GitHub in one command |
| Codex Plugin | Marketplace plugin — structured-data mode for Codex |
Supported input formats
| Format | Extension | Extra needed |
|---|---|---|
| JSON | .json |
— |
| JSONL | .jsonl, .ndjson |
— |
| CSV | .csv |
— |
| XML | .xml |
— |
| YAML | .yaml, .yml |
datoon[yaml] |
| Excel | .xlsx, .xls |
datoon[excel] |
| Parquet | .parquet |
datoon[columnar] |
| Avro | .avro |
datoon[columnar] |
| ORC | .orc |
datoon[columnar] |
| Apple Numbers | .numbers |
datoon[numbers] |
How It Works
- Detect format — from
--formatflag, file extension, or default to JSON for stdin - Read + normalize — parse source into list of row dicts; serialize to compact JSON
- Analyze structure — uniform object arrays? acceptable depth? minimum rows?
- Gate early — non-candidates skip before any CLI call; no Node.js overhead
- Convert + estimate — TOON CLI runs, token savings calculated
- Gate savings — below threshold → return JSON; above → return TOON with report
Every path returns a ConversionReport with decision, reason, and token estimates. Pipelines never get silent surprises.
Quick Start
JSON (stdin):
echo '{"users":[{"id":1,"name":"Ada"},{"id":2,"name":"Lin"},{"id":3,"name":"Grace"}]}' | datoon --report-stdout
CSV (auto-detected from extension):
datoon data.csv --report-stdout
JSONL:
datoon data.jsonl -o output.toon
YAML (requires datoon[yaml]):
datoon data.yaml --report-stdout
Parquet (requires datoon[columnar]):
datoon data.parquet --report ./report.json
Explicit format override:
datoon --format csv < data.csv --report-stdout
Force conversion (bypass gating — for experiments):
datoon data.json --force --report-stdout
Python API
JSON conversion:
from datoon import convert_json_for_llm, ConversionConfig, DatoonError
config = ConversionConfig(min_savings_ratio=0.15, max_depth=6, min_uniform_rows=3)
try:
outcome = convert_json_for_llm(raw_json, config)
except DatoonError as exc:
raise
# outcome.payload_text — TOON or original JSON
# outcome.report.decision — "convert" | "skip"
# outcome.report.reason — human-readable explanation
send_to_model(outcome.payload_text)
Any format via read_tabular:
import json
from pathlib import Path
from datoon import read_tabular, convert_json_for_llm, ConversionConfig
# text formats: csv, jsonl, yaml, xml
rows = read_tabular("csv", text=csv_string)
# binary formats: excel, parquet, orc, avro, numbers
rows = read_tabular("parquet", path=Path("data.parquet"))
json_text = json.dumps(rows, separators=(",", ":"))
outcome = convert_json_for_llm(json_text, ConversionConfig())
send_to_model(outcome.payload_text)
Structure-only analysis (no Node.js required):
from datoon.analyzer import analyze_payload
from datoon.models import ConversionConfig
analysis = analyze_payload(parsed_data, ConversionConfig())
print(analysis.is_candidate, analysis.reason)
MCP Server
datoon ships an MCP server with three tools:
| Tool | Description |
|---|---|
convert_json |
Full JSON conversion with policy gating |
convert_text |
Converts CSV, YAML, XML, or JSONL text with policy gating |
analyze_json |
Structure analysis only — no Node.js needed |
Claude Desktop / Cursor / Windsurf config:
{
"mcpServers": {
"datoon": {
"command": "uvx",
"args": ["--from", "datoon[mcp]", "datoon", "mcp"]
}
}
}
Run locally:
datoon mcp # or the standalone script: datoon-mcp
Listed on the MCP Registry, Smithery, and Glama. See MARKETPLACES.md.
Claude Code Plugin
Install directly from GitHub:
claude plugin marketplace add andrii-su/datoon
claude plugin install datoon@datoon
Trigger in-session:
/datoon
convert this JSON to TOON if it saves tokens
use datoon mode for structured data
CLI Reference
| Flag | Default | Description |
|---|---|---|
--format |
auto | Input format: json, csv, jsonl, yaml, xml, excel, parquet, avro, orc, numbers |
--force |
false |
Bypass gating and minimum savings threshold |
--min-savings |
0.15 |
Minimum relative token savings required |
--max-depth |
6 |
Maximum nesting depth for auto-conversion |
--min-uniform-rows |
3 |
Minimum rows in uniform object arrays |
--timeout |
30 |
Seconds before TOON CLI call is aborted |
--report <path> |
— | Write JSON conversion report to file |
--report-stdout |
— | Print JSON conversion report to stderr |
-o <path> |
stdout | Output file path |
--version |
— | Print version and exit |
Format is auto-detected from file extension. Use --format to override or when reading from stdin.
Benchmarks
PYTHONPATH=src python benchmarks/run.py --dry-run
PYTHONPATH=src python benchmarks/run.py
PYTHONPATH=src python benchmarks/run.py --update-readme
Why auto mode outperforms forced conversion
Auto mode avoids low-benefit and high-risk payloads (orders-nested, mixed-non-uniform) while matching forced TOON's average token count on suitable ones. Every decision comes with a reasoned report.
| Scenario | JSON Baseline | Forced TOON | datoon Auto |
|---|---|---|---|
| Average tokens | 77 | 50 | 50 |
| Avg token saved | 0.0% | 26.8% | 28.1% |
| Decision quality | n/a | Converts all | Converts 3/5, skips harmful cases |
| Dataset | JSON | TOON (forced) | Raw Saved | Auto | Auto Tokens | Auto Saved |
|---|---|---|---|---|---|---|
| users-small | 56 | 31 | 44.6% | convert | 31 | 44.6% |
| events-medium | 198 | 111 | 43.9% | convert | 111 | 43.9% |
| orders-nested | 93 | 91 | 2.2% | skip | 93 | 0.0% |
| mixed-non-uniform | 35 | 37 | -5.7% | skip | 35 | 0.0% |
| metrics-wide | 133 | 63 | 52.6% | convert | 63 | 52.6% |
| Average | 103 | 67 | 27.5% | 3/5 convert | 67 | 28.2% |
Forced conversion succeeded for 5/5 payloads.
Format conversion benchmark
Token savings when converting from common structured formats (CSV, JSONL, XML, YAML). Baseline is the JSON representation of the same data — what an LLM would receive without datoon.
| Dataset | Format | JSON Tokens | TOON (forced) | Auto | Auto Tokens | Auto Saved |
|---|---|---|---|---|---|---|
| users-csv | csv | 53 | 29 | convert | 29 | 45.3% |
| events-jsonl | jsonl | 194 | 109 | convert | 109 | 43.8% |
| catalog-xml | xml | 96 | 50 | convert | 50 | 47.9% |
| metrics-yaml | yaml | 129 | 61 | convert | 61 | 52.7% |
| Average | — | 118 | 62 | 4/4 convert | 62 | 47.4% |
Forced conversion succeeded for 4/4 payloads.
Agent skill evaluation
Artifact-based subagent comparison — identical analysis tasks, two modes:
with_skill: agent received thedatoonskill and followed the conversion workflow.without_skill: agent used JSON directly, no TOON ordatoon.
3 payload sizes × 3 iterations = 18 total agent runs. Both modes: 100% correct answers.
| Scenario | Avg JSON Tokens | Avg TOON Tokens | Avg Payload Saved |
|---|---|---|---|
| small | 225 | 118 | 47.6% |
| medium | 2,972 | 1,138 | 61.7% |
| large | 17,757 | 6,673 | 62.4% |
Full report and raw outputs: benchmarks/agent_skill_eval/. Savings are payload-token estimates, not full end-to-end model-token usage.
Development
Contributor workflow: CONTRIBUTING.md. Maintainer/agent notes: CLAUDE.md.
Setup:
uv sync --extra dev
uvx pre-commit install
Tests:
pytest -m "not integration" # unit only (102 tests)
pytest # with integration (requires Node.js + npx)
Skill sync + plugin metadata:
python scripts/validate_skill_sync.py
python scripts/validate_plugin_metadata.py
Links
- INSTALL.md — full install matrix, all targets, per-agent detail
- CONTRIBUTING.md — contributor workflow
- CLAUDE.md — maintainer guide for agents
- CHANGELOG.md — release history
- SECURITY.md — vulnerability reporting
- Live docs —
docs/ - Issues — bugs, features, questions
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datoon-1.7.1.tar.gz.
File metadata
- Download URL: datoon-1.7.1.tar.gz
- Upload date:
- Size: 193.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f095eee4c972129cf769bbe2a6a34446410b5a267b2932aaa711fff0d752aba0
|
|
| MD5 |
5ac5c9b7d3712e4dc98ae2436fde8386
|
|
| BLAKE2b-256 |
db5ec6aab1b29e82779b7981584bd7f9fa9c60f1c48a53e56738ecef150e475b
|
Provenance
The following attestation bundles were made for datoon-1.7.1.tar.gz:
Publisher:
publish.yml on andrii-su/datoon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datoon-1.7.1.tar.gz -
Subject digest:
f095eee4c972129cf769bbe2a6a34446410b5a267b2932aaa711fff0d752aba0 - Sigstore transparency entry: 1915763699
- Sigstore integration time:
-
Permalink:
andrii-su/datoon@7a983c8dcb2664bc8656790578bfc4a48a7f47c3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/andrii-su
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a983c8dcb2664bc8656790578bfc4a48a7f47c3 -
Trigger Event:
workflow_run
-
Statement type:
File details
Details for the file datoon-1.7.1-py3-none-any.whl.
File metadata
- Download URL: datoon-1.7.1-py3-none-any.whl
- Upload date:
- Size: 24.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93f682e97059486dc06a686458b850569418e1f4f9c8d795af33e071f198dd86
|
|
| MD5 |
250f4d8827f8ee52970b55e536502cc1
|
|
| BLAKE2b-256 |
23ee095224737ca85c357d4fcc4720d57c4a71b226714a67e91bfaf3ac0a37bc
|
Provenance
The following attestation bundles were made for datoon-1.7.1-py3-none-any.whl:
Publisher:
publish.yml on andrii-su/datoon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datoon-1.7.1-py3-none-any.whl -
Subject digest:
93f682e97059486dc06a686458b850569418e1f4f9c8d795af33e071f198dd86 - Sigstore transparency entry: 1915763811
- Sigstore integration time:
-
Permalink:
andrii-su/datoon@7a983c8dcb2664bc8656790578bfc4a48a7f47c3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/andrii-su
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a983c8dcb2664bc8656790578bfc4a48a7f47c3 -
Trigger Event:
workflow_run
-
Statement type: