Skip to main content

Token Oriented Object Notation - Efficient data serialization for LLMs with TOON and TRON formats

Project description

๐ŸŽจ ToonStream

Token-Oriented Object Notation (TOON) & Token Reduced Object Notation (TRON) - Reduce LLM token usage by up to 73% with lossless data serialization

Python 3.8+ Version License: MIT Tests


๐Ÿ“– What is ToonStream?

ToonStream is a Python library for encoding structured data in a token-efficient format designed for Large Language Models (LLMs). It converts repetitive JSON structures into compact, tabular representations that dramatically reduce token count while maintaining 100% lossless conversion.

The Problem

LLMs charge by tokens. Verbose JSON wastes tokens and money:

[
  {"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000},
  {"id": 2, "name": "Bob", "dept": "Sales", "salary": 75000},
  {"id": 3, "name": "Carol", "dept": "Engineering", "salary": 105000}
]

Cost: 80 tokens

The Solution

TOON format eliminates redundancy:

employees[3]{id,name,dept,salary}:
1,Alice,Engineering,95000
2,Bob,Sales,75000
3,Carol,Engineering,105000

Cost: 38 tokens (-52.5% reduction)

TRON format (NEW in v2.0.0) - ultra-compact:

@id,name,dept,salary|1,Alice,Engineering,95000|2,Bob,Sales,75000|3,Carol,Engineering,105000

Cost: 28 tokens (-65% reduction)

Why ToonStream?

โœ… Save Money - Reduce API costs by up to 73% on structured data
โœ… Two Formats - TOON (tabular) and TRON (ultra-compact)
โœ… 100% Lossless - Perfect round-trip conversion, no data loss
โœ… Zero Dependencies - Pure Python, no external packages required
โœ… Fast - Sub-millisecond encoding/decoding
โœ… Smart - Automatic optimization, only improves when beneficial
โœ… Simple API - encode(data, format='tron') and decode(data, format='tron')


๐Ÿš€ Installation

pip install toonstream

Or from source:

git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream
pip install -e .

Requirements:

  • Python 3.8 or higher
  • No external dependencies (tiktoken optional for benchmarks)

Basic Usage

import toonstream

# Your data
data = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "JavaScript", "SQL"]
}

# Encode to TOON
toon_str = toonstream.encode(data)
print(toon_str)

Output:

name: "Alice"
age: 30
skills: [
  - "Python"
  - "JavaScript"
  - "SQL"
]

๐Ÿ–ฅ๏ธ CLI Tool

ToonStream includes a command-line interface for easy file conversion.

# Convert JSON to TOON (default)
toonstream encode input.json -o output.toon

# Convert JSON to TRON
toonstream encode input.json --format tron -o output.tron

# Decode TOON/TRON back to JSON
toonstream decode output.toon -o restored.json

โšก Quick Start

TOON Format (Tabular)

from toonstream import encode, decode

# Your data
data = {
    "employees": [
        {"id": 1, "name": "Alice", "dept": "Engineering"},
        {"id": 2, "name": "Bob", "dept": "Sales"},
        {"id": 3, "name": "Carol", "dept": "Engineering"}
    ]
}

# Encode to TOON format (default)
toon_str = encode(data)
print(toon_str)
# Output:
# employees[3]{id,name,dept}:
# 1,Alice,Engineering
# 2,Bob,Sales
# 3,Carol,Engineering

# Decode back to Python
decoded = decode(toon_str)
assert decoded == data  # โœ“ Perfect round-trip!

TRON Format (Ultra-Compact) - NEW in v2.0.0

from toonstream import encode, decode

# Simple object
data = {"name": "Alice", "age": 30, "active": True}

# Encode to TRON format
tron_str = encode(data, format='tron')
print(tron_str)
# Output: name=Alice;age=30;active=1

# Nested objects use dot notation
nested = {"user": {"profile": {"name": "Bob", "city": "NYC"}}}
print(encode(nested, format='tron'))
# Output: user.profile.name=Bob;user.profile.city=NYC

# Tabular data uses @header|row format
employees = [
    {"id": 1, "name": "Alice", "dept": "Engineering"},
    {"id": 2, "name": "Bob", "dept": "Sales"},
]
print(encode(employees, format='tron'))
# Output: @id,name,dept|1,Alice,Engineering|2,Bob,Sales

# Decode TRON back to Python
decoded = decode(tron_str, format='tron')
assert decoded == data  # โœ“ Perfect round-trip!

Smart Mode Selection with auto_mode

New in v1.1.0: Single parameter for intelligent mode detection

# Auto mode - automatically detects tensor data
toon_str = encode(data, auto_mode=True)
decoded = decode(toon_str, auto_mode=True)

# With PyTorch tensors (auto_mode detects and preserves them)
import torch
data_with_tensors = {
    'embeddings': torch.randn(10, 768),
    'labels': [0, 1, 0],
    'metadata': {'model': 'bert-base'}
}

# auto_mode automatically handles tensor serialization
encoded = encode(data_with_tensors, auto_mode=True)
decoded = decode(encoded, auto_mode=True)
# โœ“ Tensors preserved with metadata (dtype, device, shape)

Advanced Options

# Compact mode (minimize whitespace)
compact = encode(data, compact=True)

# Disable smart optimization (always use tabular)
always_tabular = encode(data, smart_optimize=False)

# Pretty print with indentation
pretty = encode(data, indent=2)

# Sort dictionary keys
sorted_output = encode(data, sort_keys=True)

# Combine with auto_mode
combined = encode(data, auto_mode=True, compact=True)

๐Ÿค” Which Format Should I Use?

Feature JSON TOON TRON
Human Readable โญโญโญ โญโญ โญ
Token Efficiency โญ โญโญ โญโญโญ
Parsing Speed โญโญโญ โญโญโญ โญโญโญ
Typical Savings 0% 40-55% 50-70%

๐Ÿ“‹ JSON

Use when:

  • Human readability is the #1 priority.
  • Debugging raw payloads manually.
  • Interoperating with systems that strictly require JSON.

๐Ÿ“Š TOON (Tabular)

Use when:

  • You have lists of objects (e.g., database rows, logs, CSV-like data).
  • You want significant compression but still want to be able to read the data reasonably well.
  • Example: User lists, product catalogs, transaction histories.

๐Ÿš€ TRON (Compact)

Use when:

  • Token cost or Context Window space is critical.
  • Sending large contexts to LLMs (RAG, long histories).
  • Data has deep nesting or mixed types.
  • Example: Complex configs, API responses, arbitrary object trees.

๐Ÿ“Š Performance Benchmarks

Real-world results using tiktoken (GPT-3.5/GPT-4 tokenizer):

Format Comparison (45 Complex Examples)

Format Total Tokens vs JSON Best For
JSON (pretty) 16,583 -- Human readability
JSON Compact 9,713 -41% Network transmission
TOON 7,144 -57% Large tabular data
TRON 6,431 -61% Most use cases

Detailed Results by Data Type

Data Type JSON Compact TOON TRON TRON Savings
Simple Object 41 26 28 24 +41.5%
Employee Records (5) 168 88 56 52 +69.0%
User Records (20) 842 482 252 288 +65.8%
Nested Object 58 29 38 27 +53.4%
Mixed Structure 81 40 33 28 +65.4%
Products (50) 2,116 1,216 677 723 +65.8%

When to Use Each Format

Use Case Best Format Token Savings
Simple key-value objects TRON 40%+
Nested configurations TRON 50%+
Large tabular data (50+ rows) TOON 45%+
Mixed nested + arrays TRON 65%+
Streaming to LLMs TRON 60%+

๐ŸŸข TRON wins 87% of benchmarks - use it for most LLM applications!

  • Small datasets (<3 items)

Speed

All operations complete in under 1 millisecond for typical datasets:

  • 50 records: 0.41ms
  • 100 records: 0.83ms
  • Decode: <1ms

๐ŸŽฏ Use Cases

1. LLM Context Optimization

  1. Install in development mode:
pip install -e .
  1. Install development dependencies (optional):
pip install -e ".[dev]"

This includes:

  • pytest - Testing framework
  • pytest-cov - Coverage reporting
  • tiktoken - Token counting
  • black - Code formatting

Verify Installation

# Run tests
pytest tests/test_toonstream.py

# Run benchmarks
python benchmarks/run_all_comparisons.py

# Try the tutorial
jupyter notebook examples/toonstream_tutorial.ipynb

Project Structure

toonstream/
โ”œโ”€โ”€ toonstream/           # Core library
โ”‚   โ”œโ”€โ”€ __init__.py       # Public API exports
โ”‚   โ”œโ”€โ”€ encoder.py        # TOON encoder
โ”‚   โ”œโ”€โ”€ decoder.py        # TOON decoder
โ”‚   โ”œโ”€โ”€ tron_encoder.py   # TRON encoder (NEW in v2.0.0)
โ”‚   โ”œโ”€โ”€ tron_decoder.py   # TRON decoder (NEW in v2.0.0)
โ”‚   โ”œโ”€โ”€ unified_api.py    # Unified encode/decode API
โ”‚   โ”œโ”€โ”€ cli.py            # CLI implementation (NEW v2.0.0)
โ”‚   โ”œโ”€โ”€ exceptions.py     # Exception hierarchy
โ”‚   โ””โ”€โ”€ pickle_utils.py   # Pickle integration
โ”œโ”€โ”€ benchmarks/           # Performance tests
โ”œโ”€โ”€ tests/                # Test suite (186 tests, 100% passing)
โ”œโ”€โ”€ examples/             # Usage examples
โ”œโ”€โ”€ data/                 # Benchmark datasets
โ”œโ”€โ”€ results/              # Benchmark results
โ”œโ”€โ”€ README.md             # This file
โ”œโ”€โ”€ PICKLE_USAGE.md       # Pickle utilities guide
โ”œโ”€โ”€ pyproject.toml        # Modern package configuration
โ”œโ”€โ”€ .pre-commit-config.yaml # Code quality hooks
โ”œโ”€โ”€ setup.py              # Package configuration
โ””โ”€โ”€ requirements.txt      # Dependencies

1. LLM Context Optimization

import toonstream

# Pass structured data to LLM
context = {
    "users": [...],  # 100 user records
    "products": [...],  # 50 products
    "orders": [...]  # 200 orders
}

# Reduce prompt tokens by 40%
toon_context = toonstream.encode(context)
response = llm.complete(f"Analyze this data:\n{toon_context}")

2. Pickle Integration

Save data with TOON encoding for additional compression:

from toonstream import save_toon_pickle, load_toon_pickle

# Save with TOON encoding
data = {"users": [...], "logs": [...]}
save_toon_pickle(data, 'data.toon.pkl')

# Load back
loaded = load_toon_pickle('data.toon.pkl')

# 11.4% smaller than regular pickle!

3. API Response Optimization

from toonstream import encode
from flask import Flask, Response

app = Flask(__name__)

@app.route('/api/employees')
def get_employees():
    employees = db.query("SELECT * FROM employees")
    toon_data = encode(employees)
    return Response(toon_data, mimetype='text/plain')

# Clients get 55% smaller responses

4. Configuration Files

import toonstream

config = {
    "database": {"host": "localhost", "port": 5432},
    "cache": {"ttl": 3600, "max_size": 1000}
}

# Save human-readable config
with open('config.toon', 'w') as f:
    f.write(toonstream.encode(config, indent=2))

# Load config
with open('config.toon') as f:
    config = toonstream.decode(f.read())

๐Ÿ› ๏ธ API Reference

Core Functions

encode(obj, format='toon', auto_mode=False, compact=False, smart_optimize=True, indent=None, sort_keys=False)

Convert Python object to TOON or TRON format.

Parameters:

  • obj (Any): Python object (dict, list, primitive)
  • format (str): Output format - 'toon' (default) or 'tron'. New in v2.0.0!
  • auto_mode (bool): Auto-detect mode (tensor vs normal). (default: False)
  • compact (bool): Minimize whitespace (default: False)
  • smart_optimize (bool): Auto-detect best format (default: True)
  • indent (int): Indentation spaces, None for compact (default: None)
  • sort_keys (bool): Sort dictionary keys alphabetically (default: False)

Returns: str - TOON or TRON formatted string

Raises: ToonEncodeError - If encoding fails

# Basic encoding (TOON format - default)
toon = encode(data)

# TRON format - ultra-compact (New in v2.0.0!)
tron = encode(data, format='tron')

# Auto mode - automatically detects and handles tensors
toon = encode(data, auto_mode=True)

# Compact output
toon = encode(data, compact=True)

# Sort dictionary keys
toon = encode(data, sort_keys=True)

# Always use tabular (no optimization)
toon = encode(data, smart_optimize=False)

# Pretty print with 2-space indent
toon = encode(data, indent=2)

# Combine parameters
toon = encode(data, auto_mode=True, compact=True, sort_keys=True)

decode(data_str, format='toon', auto_mode=False, strict=True)

Convert TOON or TRON format to Python object.

Parameters:

  • data_str (str): TOON or TRON formatted string
  • format (str): Input format - 'toon' (default), 'tron', or 'auto'. New in v2.0.0!
  • auto_mode (bool): Auto-detect mode for decoding. (default: False)
  • strict (bool): Enforce strict validation (default: True)

Returns: Any - Python object

Raises: ToonDecodeError or TronDecodeError - If decoding fails

# Decode TOON string (default)
data = decode(toon_str)

# Decode TRON string (New in v2.0.0!)
data = decode(tron_str, format='tron')

# Auto mode - automatically detects and reconstructs tensors
data = decode(toon_str, auto_mode=True)

# Lenient mode (allows minor format issues)
data = decode(toon_str, strict=False)

# Combine parameters
data = decode(toon_str, auto_mode=True, strict=True)

TRON-Specific Functions (New in v2.0.0!)

tron_encode(obj)

Encode Python object directly to TRON format.

from toonstream import tron_encode

tron = tron_encode({"name": "Alice", "age": 30})
# Output: name=Alice;age=30

tron_decode(tron_str)

Decode TRON string directly to Python object.

from toonstream import tron_decode

data = tron_decode("name=Alice;age=30")
# Output: {'name': 'Alice', 'age': 30}

Pickle Functions

save_toon_pickle(data, filepath, smart_optimize=True, protocol=HIGHEST_PROTOCOL)

Save data as TOON-encoded pickle file.

Parameters:

  • data (Any): Python object to save
  • filepath (str): Output file path
  • smart_optimize (bool): Use TOON optimization (default: True)
  • protocol (int): Pickle protocol version (default: HIGHEST_PROTOCOL)
from toonstream import save_toon_pickle

save_toon_pickle(data, 'data.toon.pkl')

load_toon_pickle(filepath, strict=True)

Load TOON-encoded pickle file.

Parameters:

  • filepath (str): Input file path
  • strict (bool): Enforce strict TOON validation (default: True)

Returns: Any - Loaded Python object

from toonstream import load_toon_pickle

data = load_toon_pickle('data.toon.pkl')

Exceptions

  • ToonError - Base exception
  • ToonEncodeError - Encoding failures (unsupported types, NaN, Infinity)
  • ToonDecodeError - Decoding failures (invalid format, syntax errors)
  • ToonValidationError - Validation failures
  • ToonPickleError - Pickle operation failures

๐Ÿงช Development & Testing

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests (130 tests, all passing)
pytest tests/ -v

# Run specific test file
pytest tests/test_both_modes.py -v

# Run with coverage
pytest tests/ --cov=toonstream --cov-report=html

# Open coverage report
open htmlcov/index.html

Running Benchmarks

# Run all benchmarks
python benchmarks/run_all_comparisons.py

# Results appear in terminal and save to results/

Project Structure

toonstream/
โ”œโ”€โ”€ toonstream/               # Core library
โ”‚   โ”œโ”€โ”€ __init__.py           # Public API exports
โ”‚   โ”œโ”€โ”€ encoder.py            # TOON encoder
โ”‚   โ”œโ”€โ”€ decoder.py            # TOON decoder
โ”‚   โ”œโ”€โ”€ tron_encoder.py       # TRON encoder (NEW v2.0.0)
โ”‚   โ”œโ”€โ”€ tron_decoder.py       # TRON decoder (NEW v2.0.0)
โ”‚   โ”œโ”€โ”€ tensor_utils.py       # PyTorch tensor support
โ”‚   โ”œโ”€โ”€ pickle_utils.py       # Pickle integration
โ”‚   โ”œโ”€โ”€ exceptions.py         # Exception hierarchy
โ”‚   โ””โ”€โ”€ unified_api.py        # Unified encode/decode with format parameter
โ”œโ”€โ”€ benchmarks/               # Performance benchmarks
โ”‚   โ”œโ”€โ”€ run_all_comparisons.py
โ”‚   โ”œโ”€โ”€ compare_tron_formats.py  # 4-format comparison (NEW v2.0.0)
โ”‚   โ”œโ”€โ”€ complex_benchmarks.py    # 45 test cases (NEW v2.0.0)
โ”‚   โ”œโ”€โ”€ token_counters.py        # tiktoken integration (NEW v2.0.0)
โ”‚   โ””โ”€โ”€ config.json
โ”œโ”€โ”€ tests/                    # Test suite (186 tests, 100% passing)
โ”‚   โ”œโ”€โ”€ test_toonstream.py    # Core functionality
โ”‚   โ”œโ”€โ”€ test_tron.py          # TRON format tests (56 tests - NEW v2.0.0)
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ examples/                 # Usage examples
โ”‚   โ”œโ”€โ”€ basic_example.py      # Simple encoding/decoding
โ”‚   โ”œโ”€โ”€ tron_example.py       # TRON format usage (NEW v2.0.0)
โ”‚   โ”œโ”€โ”€ tron_tutorial.ipynb   # Interactive TRON tutorial (NEW v2.0.0)
โ”‚   โ”œโ”€โ”€ tensor_example.py     # PyTorch integration
โ”‚   โ””โ”€โ”€ README.md
โ”œโ”€โ”€ .github/workflows/        # CI/CD workflows
โ”‚   โ”œโ”€โ”€ tests.yml             # Automated testing
โ”‚   โ”œโ”€โ”€ publish.yml           # Release & PyPI publishing
โ”‚   โ””โ”€โ”€ release-checklist.yml # Pre-release validation
โ”œโ”€โ”€ data/                     # Benchmark datasets
โ”œโ”€โ”€ results/                  # Benchmark results
โ”œโ”€โ”€ README.md                 # This file
โ”œโ”€โ”€ PICKLE_USAGE.md           # Pickle utilities guide
โ”œโ”€โ”€ pyproject.toml            # Modern package configuration
โ”œโ”€โ”€ setup.py                  # Package configuration
โ””โ”€โ”€ requirements.txt          # Dependencies

๐Ÿ“– Examples

See the examples/ directory for complete examples:

  • basic_example.py - Getting started guide
  • tron_example.py - Using TRON format (NEW in v2.0.0)
  • tron_tutorial.ipynb - Interactive TRON tutorial (NEW in v2.0.0)
  • tensor_example.py - PyTorch tensor integration
  • README.md - Examples documentation

Run them:

python examples/basic_example.py
python examples/tron_example.py
python examples/tensor_example.py  # Requires PyTorch

What's New in v2.0.0?

TRON Format (Token Reduced Object Notation):

  • ๐Ÿš€ Ultra-compact serialization reducing token usage by 50-70%.
  • ๐Ÿ“Š 65% savings vs JSON for typical datasets.
  • ๐Ÿ”„ 100% loss-less round-trip conversion.

New CLI Tool:

  • ๐Ÿ–ฅ๏ธ Process files directly: toonstream encode data.json -o data.tron
  • ๐Ÿ› ๏ธ Easy integration into data pipelines.

Production Reliability:

  • โœ… Expanded test suite (186 tests, 100% coverage).
  • ๐Ÿ”’ Full type safety with strict mypy compliance.
  • โšก Sub-millisecond performance.

Key Features:

  • tron_encode() / tron_decode() specific APIs.
  • Tabular optimization for arrays (@col1,col2|val1,val2).
  • Dot notation for nested structures (key.subkey=value).
  • Full backward compatibility with v1.1.0.

Previous: v1.1.0 (Auto Mode):

  • Single auto_mode parameter (simpler API)
  • Automatic tensor mode detection
  • 130 tests, all passing

๐Ÿค Contributing

Contributions welcome! Areas for improvement:

  1. Additional Features - Streaming encoder, additional format options
  2. Performance - C extension for faster encoding/decoding
  3. Documentation - More examples, integration guides
  4. Language Bindings - JavaScript, Go, Rust implementations

Development Setup

# Fork and clone
git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream

# Create branch
git checkout -b feature/your-feature

# Install dev dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Make changes and test
pytest tests/

# Submit PR

๐Ÿ“„ License

MIT License - see LICENSE file


๐Ÿ™ Acknowledgments

  • Inspired by CSV efficiency for tabular data
  • Built for the LLM era where tokens = money
  • Tested with real-world production datasets

๐Ÿ“ž Support


๐Ÿ”— Links


Made with โค๏ธ for the LLM community

Save tokens. Save money. Build better.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toonstream-2.0.0.tar.gz (81.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toonstream-2.0.0-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file toonstream-2.0.0.tar.gz.

File metadata

  • Download URL: toonstream-2.0.0.tar.gz
  • Upload date:
  • Size: 81.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for toonstream-2.0.0.tar.gz
Algorithm Hash digest
SHA256 8e001877eb723eb7bec40492c4544693d240610a83cc606c1b8a554de19503b4
MD5 4a3beebe1742d9a3f8cbcd99ebacb163
BLAKE2b-256 54adb37b325aa0a741f4fdcd69bff9c07bb424ad9dabf77b45ed50901ff2be67

See more details on using hashes here.

File details

Details for the file toonstream-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: toonstream-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for toonstream-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89516f661e6db6cc8007dfce89c0c6ed3e5892a34a62c62bfbee23e42d2ad73d
MD5 eb526f0f1d6bade55aa88f8a0f61d0d3
BLAKE2b-256 c2b15adad2c70fa2b6615a30feadb32cb022e9d783e35275d12f5dc906ac22ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page