Skip to main content

Token Oriented Object Notation - Efficient data serialization for LLMs

Project description

๐ŸŽจ ToonStream

Token-Oriented Object Notation (TOON) - Reduce LLM token usage by up to 55% with lossless data serialization

Python 3.8+ Version License: MIT Tests


๐Ÿ“– What is ToonStream?

ToonStream is a Python library for encoding structured data in a token-efficient format designed for Large Language Models (LLMs). It converts repetitive JSON structures into compact, tabular representations that dramatically reduce token count while maintaining 100% lossless conversion.

The Problem

LLMs charge by tokens. Verbose JSON wastes tokens and money:

[
  {"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000},
  {"id": 2, "name": "Bob", "dept": "Sales", "salary": 75000},
  {"id": 3, "name": "Carol", "dept": "Engineering", "salary": 105000}
]

Cost: 80 tokens

The Solution

TOON format eliminates redundancy:

employees[3]{id,name,dept,salary}:
1,Alice,Engineering,95000
2,Bob,Sales,75000
3,Carol,Engineering,105000

Cost: 38 tokens (-52.5% reduction)

Why ToonStream?

โœ… Save Money - Reduce API costs by up to 55% on structured data
โœ… 100% Lossless - Perfect round-trip conversion, no data loss
โœ… Zero Dependencies - Pure Python, no external packages required
โœ… Fast - Sub-millisecond encoding/decoding
โœ… Smart - Automatic optimization, only improves when beneficial
โœ… Simple API - Two functions: encode() and decode()


๐Ÿš€ Installation

pip install toonstream

Or from source:

git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream
pip install -e .

Requirements:

  • Python 3.8 or higher
  • No external dependencies (tiktoken optional for benchmarks)

Basic Usage

import toonstream

# Your data
data = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "JavaScript", "SQL"]
}

# Encode to TOON
toon_str = toonstream.encode(data)
print(toon_str)

Output:

name: "Alice"
age: 30
skills: [
  - "Python"
  - "JavaScript"
  - "SQL"
]

โšก Quick Start

Basic Usage

from toonstream import encode, decode

# Your data
data = {
    "employees": [
        {"id": 1, "name": "Alice", "dept": "Engineering"},
        {"id": 2, "name": "Bob", "dept": "Sales"},
        {"id": 3, "name": "Carol", "dept": "Engineering"}
    ]
}

# Encode to TOON format (normal mode - default)
toon_str = encode(data)
print(toon_str)
# Output:
# employees[3]{id,name,dept}:
# 1,Alice,Engineering
# 2,Bob,Sales
# 3,Carol,Engineering

# Decode back to Python
decoded = decode(toon_str)
assert decoded == data  # โœ“ Perfect round-trip!

Smart Mode Selection with auto_mode

New in v1.1.0: Single parameter for intelligent mode detection

# Auto mode - automatically detects tensor data
toon_str = encode(data, auto_mode=True)
decoded = decode(toon_str, auto_mode=True)

# With PyTorch tensors (auto_mode detects and preserves them)
import torch
data_with_tensors = {
    'embeddings': torch.randn(10, 768),
    'labels': [0, 1, 0],
    'metadata': {'model': 'bert-base'}
}

# auto_mode automatically handles tensor serialization
encoded = encode(data_with_tensors, auto_mode=True)
decoded = decode(encoded, auto_mode=True)
# โœ“ Tensors preserved with metadata (dtype, device, shape)

Advanced Options

# Compact mode (minimize whitespace)
compact = encode(data, compact=True)

# Disable smart optimization (always use tabular)
always_tabular = encode(data, smart_optimize=False)

# Pretty print with indentation
pretty = encode(data, indent=2)

# Sort dictionary keys
sorted_output = encode(data, sort_keys=True)

# Combine with auto_mode
combined = encode(data, auto_mode=True, compact=True)

๐Ÿ“Š Performance Benchmarks

Real-world results from production datasets:

Data Type JSON Tokens TOON Tokens Reduction Use Case
Employee Records (50) 3,914 1,733 -55.7% HR systems, payroll
GitHub Repos (100) 14,102 8,712 -38.2% API responses
Order History (10) 2,926 2,915 -0.4% E-commerce
Config Files (20) 7,393 7,393 0.0% Microservices

When to Use TOON

๐ŸŸข Excellent Results (30-55% savings):

  • Arrays of similar objects (users, products, logs)
  • Tabular data (CSV-like structures)
  • Database query results
  • Time-series data

๐ŸŸก Good Results (10-30% savings):

  • Mixed nested structures
  • API responses with arrays
  • Semi-structured documents

๐Ÿ”ด Neutral Results (ยฑ5%):

  • Deeply nested JSON (5+ levels)
  • Unique object structures
  • Small datasets (<3 items)

Speed

All operations complete in under 1 millisecond for typical datasets:

  • 50 records: 0.41ms
  • 100 records: 0.83ms
  • Decode: <1ms

๐ŸŽฏ Use Cases

1. LLM Context Optimization

  1. Install in development mode:
pip install -e .
  1. Install development dependencies (optional):
pip install -e ".[dev]"

This includes:

  • pytest - Testing framework
  • pytest-cov - Coverage reporting
  • tiktoken - Token counting
  • black - Code formatting

Verify Installation

# Run tests
pytest tests/test_toonstream.py

# Run benchmarks
python benchmarks/run_all_comparisons.py

# Try the tutorial
jupyter notebook examples/toonstream_tutorial.ipynb

Project Structure

toonstream/
โ”œโ”€โ”€ toonstream/           # Core library
โ”‚   โ”œโ”€โ”€ __init__.py       # Public API exports
โ”‚   โ”œโ”€โ”€ encoder.py        # TOON encoder (485 lines)
โ”‚   โ”œโ”€โ”€ decoder.py        # TOON decoder (533 lines)
โ”‚   โ”œโ”€โ”€ exceptions.py     # Exception hierarchy (60 lines)
โ”‚   โ””โ”€โ”€ pickle_utils.py   # Pickle integration (177 lines)
โ”œโ”€โ”€ benchmarks/           # Performance tests
โ”œโ”€โ”€ tests/                # Test suite (51 tests, 100% passing)
โ”œโ”€โ”€ examples/             # Usage examples
โ”œโ”€โ”€ data/                 # Benchmark datasets
โ”œโ”€โ”€ results/              # Benchmark results
โ”œโ”€โ”€ README.md             # This file
โ”œโ”€โ”€ PICKLE_USAGE.md       # Pickle utilities guide
โ”œโ”€โ”€ pyproject.toml        # Modern package configuration
โ”œโ”€โ”€ setup.py              # Package configuration
โ””โ”€โ”€ requirements.txt      # Dependencies

1. LLM Context Optimization

import toonstream

# Pass structured data to LLM
context = {
    "users": [...],  # 100 user records
    "products": [...],  # 50 products
    "orders": [...]  # 200 orders
}

# Reduce prompt tokens by 40%
toon_context = toonstream.encode(context)
response = llm.complete(f"Analyze this data:\n{toon_context}")

2. Pickle Integration

Save data with TOON encoding for additional compression:

from toonstream import save_toon_pickle, load_toon_pickle

# Save with TOON encoding
data = {"users": [...], "logs": [...]}
save_toon_pickle(data, 'data.toon.pkl')

# Load back
loaded = load_toon_pickle('data.toon.pkl')

# 11.4% smaller than regular pickle!

3. API Response Optimization

from toonstream import encode
from flask import Flask, Response

app = Flask(__name__)

@app.route('/api/employees')
def get_employees():
    employees = db.query("SELECT * FROM employees")
    toon_data = encode(employees)
    return Response(toon_data, mimetype='text/plain')

# Clients get 55% smaller responses

4. Configuration Files

import toonstream

config = {
    "database": {"host": "localhost", "port": 5432},
    "cache": {"ttl": 3600, "max_size": 1000}
}

# Save human-readable config
with open('config.toon', 'w') as f:
    f.write(toonstream.encode(config, indent=2))

# Load config
with open('config.toon') as f:
    config = toonstream.decode(f.read())

๐Ÿ› ๏ธ API Reference

Core Functions

encode(obj, auto_mode=False, compact=False, smart_optimize=True, indent=None, sort_keys=False)

Convert Python object to TOON format.

Parameters:

  • obj (Any): Python object (dict, list, primitive)
  • auto_mode (bool): Auto-detect mode (tensor vs normal). New in v1.1.0! (default: False)
  • compact (bool): Minimize whitespace (default: False)
  • smart_optimize (bool): Auto-detect best format (default: True)
  • indent (int): Indentation spaces, None for compact (default: None)
  • sort_keys (bool): Sort dictionary keys alphabetically (default: False)

Returns: str - TOON formatted string

Raises: ToonEncodeError - If encoding fails

# Basic encoding (normal mode)
toon = encode(data)

# Auto mode - automatically detects and handles tensors
toon = encode(data, auto_mode=True)

# Compact output
toon = encode(data, compact=True)

# Sort dictionary keys
toon = encode(data, sort_keys=True)

# Always use tabular (no optimization)
toon = encode(data, smart_optimize=False)

# Pretty print with 2-space indent
toon = encode(data, indent=2)

# Combine parameters
toon = encode(data, auto_mode=True, compact=True, sort_keys=True)

decode(toon_str, auto_mode=False, strict=True)

Convert TOON format to Python object.

Parameters:

  • toon_str (str): TOON formatted string
  • auto_mode (bool): Auto-detect mode for decoding. New in v1.1.0! (default: False)
  • strict (bool): Enforce strict validation (default: True)

Returns: Any - Python object

Raises: ToonDecodeError - If decoding fails

# Decode TOON string (normal mode)
data = decode(toon_str)

# Auto mode - automatically detects and reconstructs tensors
data = decode(toon_str, auto_mode=True)

# Lenient mode (allows minor format issues)
data = decode(toon_str, strict=False)

# Combine parameters
data = decode(toon_str, auto_mode=True, strict=True)

Pickle Functions

save_toon_pickle(data, filepath, smart_optimize=True, protocol=HIGHEST_PROTOCOL)

Save data as TOON-encoded pickle file.

Parameters:

  • data (Any): Python object to save
  • filepath (str): Output file path
  • smart_optimize (bool): Use TOON optimization (default: True)
  • protocol (int): Pickle protocol version (default: HIGHEST_PROTOCOL)
from toonstream import save_toon_pickle

save_toon_pickle(data, 'data.toon.pkl')

load_toon_pickle(filepath, strict=True)

Load TOON-encoded pickle file.

Parameters:

  • filepath (str): Input file path
  • strict (bool): Enforce strict TOON validation (default: True)

Returns: Any - Loaded Python object

from toonstream import load_toon_pickle

data = load_toon_pickle('data.toon.pkl')

Exceptions

  • ToonError - Base exception
  • ToonEncodeError - Encoding failures (unsupported types, NaN, Infinity)
  • ToonDecodeError - Decoding failures (invalid format, syntax errors)
  • ToonValidationError - Validation failures
  • ToonPickleError - Pickle operation failures

๐Ÿงช Development & Testing

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests (130 tests, all passing)
pytest tests/ -v

# Run specific test file
pytest tests/test_both_modes.py -v

# Run with coverage
pytest tests/ --cov=toonstream --cov-report=html

# Open coverage report
open htmlcov/index.html

Running Benchmarks

# Run all benchmarks
python benchmarks/run_all_comparisons.py

# Results appear in terminal and save to results/

Project Structure

toonstream/
โ”œโ”€โ”€ toonstream/               # Core library
โ”‚   โ”œโ”€โ”€ __init__.py           # Public API exports
โ”‚   โ”œโ”€โ”€ encoder.py            # TOON encoder
โ”‚   โ”œโ”€โ”€ decoder.py            # TOON decoder
โ”‚   โ”œโ”€โ”€ tensor_utils.py       # PyTorch tensor support
โ”‚   โ”œโ”€โ”€ pickle_utils.py       # Pickle integration
โ”‚   โ”œโ”€โ”€ exceptions.py         # Exception hierarchy
โ”‚   โ””โ”€โ”€ unified_api.py        # Unified encode/decode with auto_mode (NEW v1.1.0)
โ”œโ”€โ”€ benchmarks/               # Performance benchmarks
โ”‚   โ”œโ”€โ”€ run_all_comparisons.py
โ”‚   โ””โ”€โ”€ config.json
โ”œโ”€โ”€ tests/                    # Test suite (130 tests, 100% passing)
โ”‚   โ”œโ”€โ”€ test_toonstream.py    # Core functionality (51 tests)
โ”‚   โ”œโ”€โ”€ test_auto_mode_api.py # Auto mode parameter (19 tests)
โ”‚   โ”œโ”€โ”€ test_both_modes.py    # Comparison tests (41 tests) - NEW v1.1.0
โ”‚   โ””โ”€โ”€ test_tensor_utils.py  # Tensor support (19 tests)
โ”œโ”€โ”€ examples/                 # Usage examples
โ”‚   โ”œโ”€โ”€ basic_example.py      # Simple encoding/decoding
โ”‚   โ”œโ”€โ”€ auto_mode_example.py  # Auto mode usage (NEW v1.1.0)
โ”‚   โ”œโ”€โ”€ tensor_example.py     # PyTorch integration
โ”‚   โ””โ”€โ”€ README.md
โ”œโ”€โ”€ .github/workflows/        # CI/CD workflows (NEW v1.1.0)
โ”‚   โ”œโ”€โ”€ tests.yml             # Automated testing
โ”‚   โ”œโ”€โ”€ publish.yml           # Release & PyPI publishing
โ”‚   โ””โ”€โ”€ release-checklist.yml # Pre-release validation
โ”œโ”€โ”€ data/                     # Benchmark datasets
โ”œโ”€โ”€ results/                  # Benchmark results
โ”œโ”€โ”€ README.md                 # This file
โ”œโ”€โ”€ RELEASE_NOTES_v1.1.0.md   # What's new in v1.1.0 (NEW)
โ”œโ”€โ”€ PICKLE_USAGE.md           # Pickle utilities guide
โ”œโ”€โ”€ pyproject.toml            # Modern package configuration
โ”œโ”€โ”€ setup.py                  # Package configuration
โ””โ”€โ”€ requirements.txt          # Dependencies

๐Ÿ“– Examples

See the examples/ directory for complete examples:

  • basic_example.py - Getting started guide
  • auto_mode_example.py - Using auto_mode parameter (NEW in v1.1.0)
  • tensor_example.py - PyTorch tensor integration
  • README.md - Examples documentation

Run them:

python examples/basic_example.py
python examples/auto_mode_example.py
python examples/tensor_example.py  # Requires PyTorch

What's New in v1.1.0?

Key improvements:

  • โœ… Single auto_mode parameter (simpler API)
  • โœ… 41 new comprehensive tests
  • โœ… 130 total tests, all passing
  • โœ… Automatic tensor mode detection
  • โœ… Enhanced CI/CD workflows
  • โœ… Full backward compatibility

See RELEASE_NOTES_v1.1.0.md for full details.


๐Ÿค Contributing

Contributions welcome! Areas for improvement:

  1. Additional Features - CLI tool, streaming encoder, additional format options
  2. Performance - C extension for faster encoding/decoding
  3. Documentation - More examples, integration guides
  4. Language Bindings - JavaScript, Go, Rust implementations

Development Setup

# Fork and clone
git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream

# Create branch
git checkout -b feature/your-feature

# Install dev dependencies
pip install -e ".[dev]"

# Make changes and test
pytest tests/

# Submit PR

๐Ÿ“„ License

MIT License - see LICENSE file


๐Ÿ™ Acknowledgments

  • Inspired by CSV efficiency for tabular data
  • Built for the LLM era where tokens = money
  • Tested with real-world production datasets

๐Ÿ“ž Support


๐Ÿ”— Links


Made with โค๏ธ for the LLM community

Save tokens. Save money. Build better.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toonstream-1.1.0.tar.gz (68.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toonstream-1.1.0-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file toonstream-1.1.0.tar.gz.

File metadata

  • Download URL: toonstream-1.1.0.tar.gz
  • Upload date:
  • Size: 68.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for toonstream-1.1.0.tar.gz
Algorithm Hash digest
SHA256 845edb9fbe7dbaec66255081bf0689072c90a7f50613b00f9e44c968af0c72a1
MD5 104b8825de2ae73c72837983adfaece8
BLAKE2b-256 51bcc1b457a03ed8149439dcfeda34c30dbb733320cc5c79091bb8f893b0f40e

See more details on using hashes here.

File details

Details for the file toonstream-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: toonstream-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for toonstream-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f78f6da76657e9d2e2ec03668be7a8be16fbcb3e263e530533b097bbd24a7e5f
MD5 771f8738ce68298efe985a4260239156
BLAKE2b-256 b1b4f0c9d73ea53d7d8f947f2fc45fe69aaf1a0d43ea6cd366f2925b28d078c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page