Skip to main content

TOON (Token-Oriented Object Notation) Python API - A compact data format optimized for LLM token usage

Project description

TOON Python API

TOON (Token-Oriented Object Notation) is a compact, human-readable data format designed to reduce token usage when passing data to large language models. Compared to JSON format, TOON can reduce token usage by 30-60%.

This project provides a Python API library with a Rust backend, delivering high-performance Python bindings through PyO3.

Features

  • Encode and Decode: Bidirectional conversion between Python objects and TOON format
  • Table Format Optimization: Automatically detects uniform object arrays and compresses them using table format
  • Multiple Array Formats: Supports inline arrays, table arrays, list arrays, and arrays of arrays
  • Nested Structures: Full support for nested objects and arrays
  • Custom Options: Supports custom indentation, delimiters, and length markers
  • High Performance: Rust backend provides fast encoding/decoding performance

Installation

pip install tost

Requirements:

  • Python 3.8+

Development Installation

If you need to install from source or for development:

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install maturin
pip install maturin

# Install from source
pip install .

# Or install in development mode (recommended for development)
maturin develop

# Or build wheel files
maturin build --release

Usage Examples

Basic Encoding

from tost import encode

# Simple object
obj = {
    "id": 123,
    "name": "Ada Lovelace",
    "email": "ada@example.com",
    "active": True
}

result = encode(obj)
print(result)
# Output:
# id: 123
# name: Ada Lovelace
# email: ada@example.com
# active: true

Table Format Arrays

from tost import encode

# Table format array (auto-optimized)
products = {
    "items": [
        {"sku": "LAPTOP-15", "qty": 5, "price": 899.99},
        {"sku": "MOUSE-BT", "qty": 25, "price": 29.99},
        {"sku": "KEYBOARD-MX", "qty": 12, "price": 149.00}
    ]
}

result = encode(products)
print(result)
# Output:
# items[3]{sku,qty,price}:
#   LAPTOP-15,5,899.99
#   MOUSE-BT,25,29.99
#   KEYBOARD-MX,12,149

Inline Arrays

from tost import encode

# Inline array (primitive type array)
tags = {
    "tags": ["javascript", "typescript", "nodejs", "llm"]
}

result = encode(tags)
print(result)
# Output:
# tags[4]: javascript,typescript,nodejs,llm

Nested Structures

from tost import encode

order = {
    "orderId": "ORD-2025-001",
    "customer": {
        "name": "John Smith",
        "email": "john@example.com"
    },
    "items": [
        {"product": "Widget A", "quantity": 2, "price": 19.99},
        {"product": "Widget B", "quantity": 1, "price": 34.50}
    ],
    "total": 74.48,
    "tags": ["priority", "gift-wrap"]
}

result = encode(order)
print(result)
# Output:
# orderId: ORD-2025-001
# customer:
#   name: John Smith
#   email: john@example.com
# items[2]{product,quantity,price}:
#   Widget A,2,19.99
#   Widget B,1,34.5
# total: 74.48
# tags[2]: priority,gift-wrap

Decoding

from tost import decode

tost_str = """
id: 123
name: Ada Lovelace
active: true
items[2]{sku,qty}:
  A1,2
  B2,1
"""

result = decode(tost_str)
print(result)
# Output:
# {
#     'id': 123,
#     'name': 'Ada Lovelace',
#     'active': True,
#     'items': [
#         {'sku': 'A1', 'qty': 2},
#         {'sku': 'B2', 'qty': 1}
#     ]
# }

Custom Options

from tost import encode

obj = {
    "items": [
        {"sku": "A1", "qty": 2},
        {"sku": "B2", "qty": 1}
    ]
}

# Custom indentation, delimiter, and length marker
result = encode(
    obj,
    indent=4,           # 4-space indentation
    delimiter="|",       # Use pipe as delimiter
    length_marker="#"     # Use # as length marker
)
print(result)
# Output:
# items[#2|]{sku|qty}:
#     A1|2
#     B2|1

API Reference

encode(obj, indent=2, delimiter=",", length_marker=None)

Encode a Python object to TOON format string.

Parameters:

  • obj: Python object to encode (dict, list, primitive types, etc.)
  • indent (int, optional): Number of spaces per indentation level (default: 2)
  • delimiter (str, optional): Delimiter for array values and table rows (default: ',')
  • length_marker (str, optional): Prefix marker for array length (e.g., '#')

Returns:

  • str: TOON format string

Examples:

result = encode({"id": 123, "name": "Alice"})
result = encode(obj, indent=4, delimiter="|", length_marker="#")

decode(tost_str)

Decode a TOON format string to Python object.

Parameters:

  • tost_str (str): TOON format string

Returns:

  • Python object (dict, list, or primitive type)

Examples:

obj = decode("id: 123\nname: Alice")

TOON Format Specification

Object Format

key: value

Table Array Format

When all objects in an array have the same keys and all values are primitive types, table format is used:

items[N]{field1,field2,field3}:
  value1,value2,value3
  value4,value5,value6

Inline Array Format

Primitive type arrays use inline format:

tags[N]: value1,value2,value3

List Format

Mixed or non-uniform arrays use list format:

items[N]:
  - value1
  - key: value
    other: value2
  - value3

Array of Arrays Format

pairs[N]:
  - [M]: value1,value2
  - [M]: value3,value4

Root-Level Arrays

When the root-level value is an array, use a header form without a key name:

[N]{field1,field2}:
  value1,value2
  value3,value4

Or for primitive type arrays:

[N]: value1,value2,value3

Project Structure

tost/
├── Cargo.toml              # Rust workspace configuration
├── pyproject.toml          # Python package configuration
├── README.md                # Project documentation
├── rust/                    # Rust core library
│   ├── Cargo.toml
│   └── src/
│       ├── lib.rs          # Main library file (contains PyO3 bindings)
│       ├── encode.rs       # TOON encoding implementation
│       └── decode.rs        # TOON decoding implementation
└── python/                  # Python package
    ├── src/
    │   └── tost/           # Python package
    │       ├── __init__.py
    │       └── tost.py     # Python interface wrapper
    └── tests/              # Python tests
        └── test_tost.py

Development

Running Tests

# Rust tests
cd rust
cargo test

# Python tests
cd python
pytest tests/

Building

# Development mode
maturin develop

# Release mode
maturin build --release

License

MIT License

References

Language

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tost-0.1.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tost-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (859.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file tost-0.1.1.tar.gz.

File metadata

  • Download URL: tost-0.1.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for tost-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1746cb8d01dde42ffd44d35e193cbf459cc68585dfffdf69ee79ecffdd456f3e
MD5 65f8a022fc9c71f482d3335576b69cfe
BLAKE2b-256 bb617cbcce3ed35ca79c572a35a075737baf6934fcad2ee10dd4d7e2ca8dca95

See more details on using hashes here.

File details

Details for the file tost-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tost-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dcde393ebc8c04ba71a60d2f778ac96491de3f41deee90c07b69e3d133894dfb
MD5 563cd23ecbebbdcc98559e0099b5207b
BLAKE2b-256 7bb86fea52c10cefa12864c577b657c214460a7f16374c22bebff3170d2a38b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page