TOON (Token-Oriented Object Notation) encoder/decoder for Python - Bidirectional JSON-to-TOON converter optimized for LLMs
Project description
python-toon encoder/decoder
Token-Oriented Object Notation for Python
A compact data format optimized for transmitting structured information to Large Language Models (LLMs) with 30-60% fewer tokens than JSON.
Installation
pip install python-toon
What is TOON?
TOON (Token-Oriented Object Notation) combines YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, optimized specifically for token efficiency in LLM contexts.
This is a faithful Python implementation maintaining 100% output compatibility with the official TOON specification.
Key Features
- 30-60% token reduction compared to standard JSON
- Minimal syntax: Eliminates redundant punctuation (braces, brackets, most quotes)
- Tabular arrays: CSV-like row format for uniform object collections
- Explicit metadata: Array length indicators
[N]for validation - LLM-friendly: Maintains semantic clarity while reducing token count
- 100% compatible with original TypeScript implementation
Quick Start
from toon import encode
# Simple object
data = {"name": "Alice", "age": 30}
print(encode(data))
# Output:
# name: Alice
# age: 30
# Tabular array (uniform objects)
users = [
{"id": 1, "name": "Alice", "age": 30},
{"id": 2, "name": "Bob", "age": 25},
{"id": 3, "name": "Charlie", "age": 35},
]
print(encode(users))
# Output:
# [3,]{id,name,age}:
# 1,Alice,30
# 2,Bob,25
# 3,Charlie,35
# Complex nested structure
data = {
"metadata": {"version": 1, "author": "test"},
"items": [
{"id": 1, "name": "Item1"},
{"id": 2, "name": "Item2"},
],
"tags": ["alpha", "beta", "gamma"],
}
print(encode(data))
# Output:
# metadata:
# version: 1
# author: test
# items[2,]{id,name}:
# 1,Item1
# 2,Item2
# tags[3]: alpha,beta,gamma
CLI Usage
Command-line tool for converting between JSON and TOON formats.
# Encode JSON to TOON (auto-detected by .json extension)
toon input.json -o output.toon
# Decode TOON to JSON (auto-detected by .toon extension)
toon data.toon -o output.json
# Use stdin/stdout
echo '{"name": "Ada"}' | toon -
# Output: name: Ada
# Force encode mode
toon data.json --encode
# Force decode mode
toon data.toon --decode
# Custom delimiter
toon data.json --delimiter "\t" -o output.toon
# With length markers
toon data.json --length-marker -o output.toon
# Lenient decoding (disable strict validation)
toon data.toon --no-strict -o output.json
CLI Options
| Option | Description |
|---|---|
-o, --output <file> |
Output file path (prints to stdout if omitted) |
-e, --encode |
Force encode mode (overrides auto-detection) |
-d, --decode |
Force decode mode (overrides auto-detection) |
--delimiter <char> |
Array delimiter: , (comma), \t (tab), | (pipe) |
--indent <number> |
Indentation size (default: 2) |
--length-marker |
Add # prefix to array lengths (e.g., items[#3]) |
--no-strict |
Disable strict validation when decoding |
API Reference
encode(value, options=None)
Converts a Python value to TOON format.
Parameters:
value(Any): JSON-serializable value to encodeoptions(dict, optional): Encoding options
Returns: str - TOON-formatted string
Example:
from toon import encode
data = {"id": 123, "name": "Ada"}
toon_str = encode(data)
print(toon_str)
# Output:
# id: 123
# name: Ada
decode(input_str, options=None)
Converts a TOON-formatted string back to Python values.
Parameters:
input_str(str): TOON-formatted string to parseoptions(DecodeOptions, optional): Decoding options
Returns: Python value (dict, list, or primitive)
Example:
from toon import decode
toon_str = """items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5"""
data = decode(toon_str)
print(data)
# Output: {'items': [{'sku': 'A1', 'qty': 2, 'price': 9.99}, {'sku': 'B2', 'qty': 1, 'price': 14.5}]}
Encoding Options
from toon import encode
encode(data, {
"indent": 2, # Spaces per indentation level (default: 2)
"delimiter": ",", # Delimiter for arrays: "," | "\t" | "|" (default: ",")
"lengthMarker": "#" # Optional marker prefix: "#" | False (default: False)
})
Decoding Options
from toon import decode, DecodeOptions
options = DecodeOptions(
indent=2, # Expected number of spaces per indentation level (default: 2)
strict=True # Enable strict validation (default: True)
)
data = decode(toon_str, options)
Strict Mode:
By default, the decoder validates input strictly:
- Invalid escape sequences: Throws on
"\x", unterminated strings - Syntax errors: Throws on missing colons, malformed headers
- Array length mismatches: Throws when declared length doesn't match actual count
- Delimiter mismatches: Throws when row delimiters don't match header
Set strict=False to allow lenient parsing.
Delimiter Options
You can use string literals directly:
data = [1, 2, 3, 4, 5]
# Comma (default)
print(encode(data))
# [5]: 1,2,3,4,5
# Tab
print(encode(data, {"delimiter": "\t"}))
# [5 ]: 1 2 3 4 5
# Pipe
print(encode(data, {"delimiter": "|"}))
# [5|]: 1|2|3|4|5
Or use the string keys:
encode(data, {"delimiter": "comma"}) # Default
encode(data, {"delimiter": "tab"}) # Tab-separated
encode(data, {"delimiter": "pipe"}) # Pipe-separated
Length Markers
Add the # prefix to array length indicators:
users = [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
]
# Without marker (default)
print(encode(users))
# [2,]{id,name}:
# 1,Alice
# 2,Bob
# With marker
print(encode(users, {"lengthMarker": "#"}))
# [#2,]{id,name}:
# 1,Alice
# 2,Bob
Format Rules
Objects
Key-value pairs with primitives or nested structures:
{"name": "Alice", "age": 30}
# =>
# name: Alice
# age: 30
Primitive Arrays
Arrays always include length [N]:
[1, 2, 3, 4, 5]
# => [5]: 1,2,3,4,5
["alpha", "beta", "gamma"]
# => [3]: alpha,beta,gamma
Tabular Arrays
Uniform objects with identical primitive-only fields use CSV-like format:
[
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
]
# =>
# [2,]{id,name}:
# 1,Alice
# 2,Bob
Note: The delimiter appears in the length bracket [2,] for tabular arrays.
Mixed Arrays
Non-uniform data using list format with - markers:
[{"name": "Alice"}, 42, "hello"]
# =>
# [3]:
# - name: Alice
# - 42
# - hello
Array Length Format
The length bracket format depends on the array type:
Tabular arrays (with fields):
- Delimiter always shown:
[2,]{fields}:or[2|]{fields}:or[2\t]{fields}:
Primitive arrays (no fields):
- Comma:
[3]:(delimiter hidden) - Other:
[3|]:or[3\t]:(delimiter shown)
Quoting Rules
Strings are quoted only when necessary (following the TOON specification):
- Empty strings
- Keywords:
null,true,false - Numeric strings:
42,-3.14 - Leading or trailing whitespace
- Contains structural characters:
:,[,],{,},-," - Contains current delimiter (
,,|, or tab) - Contains control characters (newline, carriage return, tab, backslash)
"hello" # => hello (no quotes)
"hello world" # => hello world (internal spaces OK)
" hello" # => " hello" (leading space requires quotes)
"null" # => "null" (keyword)
"42" # => "42" (looks like number)
"" # => "" (empty)
Type Conversions
Non-JSON types are normalized automatically:
- Numbers: Decimal form (no scientific notation)
- Dates/DateTime: ISO 8601 strings (quoted)
- Decimal: Converted to float
- Infinity/NaN: Converted to
null - Functions/Callables: Converted to
null - -0: Normalized to
0
LLM Integration Best Practices
When using TOON with LLMs:
-
Wrap in code blocks for clarity:
```toon name: Alice age: 30 ```
-
Instruct the model about the format:
"Respond using TOON format (Token-Oriented Object Notation). Use
key: valuesyntax, indentation for nesting, and tabular format[N,]{fields}:for uniform arrays." -
Leverage length markers for validation:
encode(data, {"lengthMarker": "#"})
Tell the model: "Array lengths are marked with
[#N]. Ensure your response matches these counts." -
Acknowledge tokenizer variance: Token savings depend on the specific tokenizer and model being used.
Token Efficiency Example
import json
from toon import encode
data = {
"users": [
{"id": 1, "name": "Alice", "age": 30, "active": True},
{"id": 2, "name": "Bob", "age": 25, "active": True},
{"id": 3, "name": "Charlie", "age": 35, "active": False},
]
}
json_str = json.dumps(data)
toon_str = encode(data)
print(f"JSON: {len(json_str)} characters")
print(f"TOON: {len(toon_str)} characters")
print(f"Reduction: {100 * (1 - len(toon_str) / len(json_str)):.1f}%")
# Output:
# JSON: 177 characters
# TOON: 85 characters
# Reduction: 52.0%
JSON output:
{"users": [{"id": 1, "name": "Alice", "age": 30, "active": true}, {"id": 2, "name": "Bob", "age": 25, "active": true}, {"id": 3, "name": "Charlie", "age": 35, "active": false}]}
TOON output:
users[3,]{id,name,age,active}:
1,Alice,30,true
2,Bob,25,true
3,Charlie,35,false
Development
This project uses uv for fast, reliable package and environment management.
Setup with uv (Recommended)
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/toon-format/toon-python.git
cd toon-python
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install package in editable mode with dev dependencies
uv pip install -e ".[dev]"
Setup with pip (Alternative)
# Clone the repository
git clone https://github.com/toon-format/toon-python.git
cd toon-python
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Install development dependencies
pip install -r requirements-dev.txt
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=toon --cov-report=term
Type Checking
mypy src/toon
Linting
ruff check src/toon tests
Credits
This project is a Python implementation of the TOON format.
License
MIT License - see LICENSE file for details
Related
- TOON Format Specification - Official specification with normative encoding rules
- TOON Format Organization - Official TOON format organization
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
When contributing, please:
- Add tests for new features
- Update documentation as needed
- Ensure compatibility with the TOON specification
Support
For bugs and feature requests, please open an issue.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_toon-0.1.3.tar.gz.
File metadata
- Download URL: python_toon-0.1.3.tar.gz
- Upload date:
- Size: 31.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca348b214c4f1cdad3579fd83dd60032d9eb87eb349c2d430ad9eb6371f174bf
|
|
| MD5 |
f5d72a72c4c91651f3fff8c8e897fdce
|
|
| BLAKE2b-256 |
4e92640c83ca46d5fe9c49895449a8932f55252537dd13dd22186cbac3a1ce59
|
Provenance
The following attestation bundles were made for python_toon-0.1.3.tar.gz:
Publisher:
publish.yml on xaviviro/python-toon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_toon-0.1.3.tar.gz -
Subject digest:
ca348b214c4f1cdad3579fd83dd60032d9eb87eb349c2d430ad9eb6371f174bf - Sigstore transparency entry: 666599005
- Sigstore integration time:
-
Permalink:
xaviviro/python-toon@9e22e08676f3716925572cdab5fa895e4dd79bec -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/xaviviro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9e22e08676f3716925572cdab5fa895e4dd79bec -
Trigger Event:
release
-
Statement type:
File details
Details for the file python_toon-0.1.3-py3-none-any.whl.
File metadata
- Download URL: python_toon-0.1.3-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a27b0ee4a729e730d1037d0a63eb8b344b3e5a26e3dc9a173067b6c31a868ee6
|
|
| MD5 |
16fbfff8d0a3a613157e93de2399acfa
|
|
| BLAKE2b-256 |
26a42f2def0378b44f913d2d6cb3bc5b1a15267b363937ab1cb9afb07ce2313c
|
Provenance
The following attestation bundles were made for python_toon-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on xaviviro/python-toon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_toon-0.1.3-py3-none-any.whl -
Subject digest:
a27b0ee4a729e730d1037d0a63eb8b344b3e5a26e3dc9a173067b6c31a868ee6 - Sigstore transparency entry: 666599052
- Sigstore integration time:
-
Permalink:
xaviviro/python-toon@9e22e08676f3716925572cdab5fa895e4dd79bec -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/xaviviro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9e22e08676f3716925572cdab5fa895e4dd79bec -
Trigger Event:
release
-
Statement type: