Token Oriented Object Notation - Efficient data serialization for LLMs

These details have not been verified by PyPI

Project links

Project description

🎨 ToonStream

Token-Oriented Object Notation (TOON) - Reduce LLM token usage by up to 55% with lossless data serialization

📖 What is ToonStream?

ToonStream is a Python library for encoding structured data in a token-efficient format designed for Large Language Models (LLMs). It converts repetitive JSON structures into compact, tabular representations that dramatically reduce token count while maintaining 100% lossless conversion.

The Problem

LLMs charge by tokens. Verbose JSON wastes tokens and money:

[
  {"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000},
  {"id": 2, "name": "Bob", "dept": "Sales", "salary": 75000},
  {"id": 3, "name": "Carol", "dept": "Engineering", "salary": 105000}
]

Cost: 80 tokens

The Solution

TOON format eliminates redundancy:

employees[3]{id,name,dept,salary}:
1,Alice,Engineering,95000
2,Bob,Sales,75000
3,Carol,Engineering,105000

Cost: 38 tokens (-52.5% reduction)

Why ToonStream?

✅ Save Money - Reduce API costs by up to 55% on structured data
✅ 100% Lossless - Perfect round-trip conversion, no data loss
✅ Zero Dependencies - Pure Python, no external packages required
✅ Fast - Sub-millisecond encoding/decoding
✅ Smart - Automatic optimization, only improves when beneficial
✅ Simple API - Two functions: encode() and decode()

🚀 Installation

pip install toonstream

Or from source:

git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream
pip install -e .

Requirements:

Python 3.8 or higher
No external dependencies (tiktoken optional for benchmarks)

Basic Usage

import toonstream

# Your data
data = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "JavaScript", "SQL"]
}

# Encode to TOON
toon_str = toonstream.encode(data)
print(toon_str)

Output:

name: "Alice"
age: 30
skills: [
  - "Python"
  - "JavaScript"
  - "SQL"
]

⚡ Quick Start

Basic Usage

from toonstream import encode, decode

# Your data
data = {
    "employees": [
        {"id": 1, "name": "Alice", "dept": "Engineering"},
        {"id": 2, "name": "Bob", "dept": "Sales"},
        {"id": 3, "name": "Carol", "dept": "Engineering"}
    ]
}

# Encode to TOON format
toon_str = encode(data)
print(toon_str)
# Output:
# employees[3]{id,name,dept}:
# 1,Alice,Engineering
# 2,Bob,Sales
# 3,Carol,Engineering

# Decode back to Python
decoded = decode(toon_str)
assert decoded == data  # ✓ Perfect round-trip!

Advanced Options

# Compact mode (minimize whitespace)
compact = encode(data, compact=True)

# Disable smart optimization (always use tabular)
always_tabular = encode(data, smart_optimize=False)

# Pretty print with indentation
pretty = encode(data, indent=2)

📊 Performance Benchmarks

Real-world results from production datasets:

Data Type	JSON Tokens	TOON Tokens	Reduction	Use Case
Employee Records (50)	3,914	1,733	-55.7%	HR systems, payroll
GitHub Repos (100)	14,102	8,712	-38.2%	API responses
Order History (10)	2,926	2,915	-0.4%	E-commerce
Config Files (20)	7,393	7,393	0.0%	Microservices

When to Use TOON

🟢 Excellent Results (30-55% savings):

Arrays of similar objects (users, products, logs)
Tabular data (CSV-like structures)
Database query results
Time-series data

🟡 Good Results (10-30% savings):

Mixed nested structures
API responses with arrays
Semi-structured documents

🔴 Neutral Results (±5%):

Deeply nested JSON (5+ levels)
Unique object structures
Small datasets (<3 items)

Speed

All operations complete in under 1 millisecond for typical datasets:

50 records: 0.41ms
100 records: 0.83ms
Decode: <1ms

🎯 Use Cases

1. LLM Context Optimization

Install in development mode:

pip install -e .

Install development dependencies (optional):

pip install -e ".[dev]"

This includes:

pytest - Testing framework
pytest-cov - Coverage reporting
tiktoken - Token counting
black - Code formatting

Verify Installation

# Run tests
pytest tests/test_toonstream.py

# Run benchmarks
python benchmarks/run_all_comparisons.py

# Try the tutorial
jupyter notebook examples/toonstream_tutorial.ipynb

Project Structure

toonstream/
├── toonstream/           # Core library
│   ├── __init__.py       # Public API exports
│   ├── encoder.py        # TOON encoder (485 lines)
│   ├── decoder.py        # TOON decoder (533 lines)
│   ├── exceptions.py     # Exception hierarchy (60 lines)
│   └── pickle_utils.py   # Pickle integration (177 lines)
├── benchmarks/           # Performance tests
├── tests/                # Test suite (51 tests, 100% passing)
├── examples/             # Usage examples
├── data/                 # Benchmark datasets
├── results/              # Benchmark results
├── README.md             # This file
├── PICKLE_USAGE.md       # Pickle utilities guide
├── pyproject.toml        # Modern package configuration
├── setup.py              # Package configuration
└── requirements.txt      # Dependencies

1. LLM Context Optimization

import toonstream

# Pass structured data to LLM
context = {
    "users": [...],  # 100 user records
    "products": [...],  # 50 products
    "orders": [...]  # 200 orders
}

# Reduce prompt tokens by 40%
toon_context = toonstream.encode(context)
response = llm.complete(f"Analyze this data:\n{toon_context}")

2. Pickle Integration

Save data with TOON encoding for additional compression:

from toonstream import save_toon_pickle, load_toon_pickle

# Save with TOON encoding
data = {"users": [...], "logs": [...]}
save_toon_pickle(data, 'data.toon.pkl')

# Load back
loaded = load_toon_pickle('data.toon.pkl')

# 11.4% smaller than regular pickle!

3. API Response Optimization

from toonstream import encode
from flask import Flask, Response

app = Flask(__name__)

@app.route('/api/employees')
def get_employees():
    employees = db.query("SELECT * FROM employees")
    toon_data = encode(employees)
    return Response(toon_data, mimetype='text/plain')

# Clients get 55% smaller responses

4. Configuration Files

import toonstream

config = {
    "database": {"host": "localhost", "port": 5432},
    "cache": {"ttl": 3600, "max_size": 1000}
}

# Save human-readable config
with open('config.toon', 'w') as f:
    f.write(toonstream.encode(config, indent=2))

# Load config
with open('config.toon') as f:
    config = toonstream.decode(f.read())

🛠️ API Reference

Core Functions

`encode(obj, compact=False, smart_optimize=True, indent=None, sort_keys=False)`

Convert Python object to TOON format.

Parameters:

obj (Any): Python object (dict, list, primitive)
compact (bool): Minimize whitespace (default: False)
smart_optimize (bool): Auto-detect best format (default: True)
indent (int): Indentation spaces, None for compact (default: None)
sort_keys (bool): Sort dictionary keys alphabetically (default: False)

Returns: str - TOON formatted string

Raises: ToonEncodeError - If encoding fails

# Basic encoding
toon = encode(data)

# Compact output
toon = encode(data, compact=True)

# Sort dictionary keys
toon = encode(data, sort_keys=True)

# Always use tabular (no optimization)
toon = encode(data, smart_optimize=False)

# Pretty print with 2-space indent
toon = encode(data, indent=2)

`decode(toon_str, strict=True)`

Convert TOON format to Python object.

Parameters:

toon_str (str): TOON formatted string
strict (bool): Enforce strict validation (default: True)

Returns: Any - Python object

Raises: ToonDecodeError - If decoding fails

# Decode TOON string
data = decode(toon_str)

# Lenient mode (allows minor format issues)
data = decode(toon_str, strict=False)

Pickle Functions

`save_toon_pickle(data, filepath, smart_optimize=True, protocol=HIGHEST_PROTOCOL)`

Save data as TOON-encoded pickle file.

Parameters:

data (Any): Python object to save
filepath (str): Output file path
smart_optimize (bool): Use TOON optimization (default: True)
protocol (int): Pickle protocol version (default: HIGHEST_PROTOCOL)

from toonstream import save_toon_pickle

save_toon_pickle(data, 'data.toon.pkl')

`load_toon_pickle(filepath, strict=True)`

Load TOON-encoded pickle file.

Parameters:

filepath (str): Input file path
strict (bool): Enforce strict TOON validation (default: True)

Returns: Any - Loaded Python object

from toonstream import load_toon_pickle

data = load_toon_pickle('data.toon.pkl')

Exceptions

ToonError - Base exception
ToonEncodeError - Encoding failures (unsupported types, NaN, Infinity)
ToonDecodeError - Decoding failures (invalid format, syntax errors)
ToonValidationError - Validation failures
ToonPickleError - Pickle operation failures

🧪 Development

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest tests/test_toonstream.py -v

# Run with coverage
pytest tests/ --cov=toonstream --cov-report=html

# Open coverage report
open htmlcov/index.html

Running Benchmarks

# Run all benchmarks
python benchmarks/run_all_comparisons.py

# Results appear in terminal and save to results/

Project Structure

toonstream/
├── toonstream/           # Core library
│   ├── __init__.py       # Public API exports
│   ├── encoder.py        # TOON encoder (485 lines)
│   ├── decoder.py        # TOON decoder (533 lines)
│   ├── exceptions.py     # Exception hierarchy (60 lines)
│   └── pickle_utils.py   # Pickle integration (177 lines)
├── benchmarks/           # Performance tests
│   ├── run_all_comparisons.py
│   └── config.json
├── tests/                # Test suite (51 tests, 100% passing)
│   └── test_toonstream.py
├── examples/             # Usage examples
│   ├── basic_example.py
│   ├── advanced_example.py
│   ├── pickle_example.py
│   └── toonstream_tutorial.ipynb
├── data/                 # Benchmark datasets
├── results/              # Benchmark results
├── README.md             # This file
├── PICKLE_USAGE.md       # Pickle utilities guide
├── pyproject.toml        # Modern package configuration
├── setup.py              # Package configuration
└── requirements.txt      # Dependencies

📖 Examples

See the examples/ directory for complete examples:

basic_example.py - Getting started guide
advanced_example.py - Smart optimization features
pickle_example.py - Pickle integration demo
toonstream_tutorial.ipynb - Interactive Jupyter notebook tutorial

Run them:

python examples/basic_example.py
python examples/advanced_example.py
python examples/pickle_example.py

🤝 Contributing

Contributions welcome! Areas for improvement:

Additional Features - CLI tool, streaming encoder, additional format options
Performance - C extension for faster encoding/decoding
Documentation - More examples, integration guides
Language Bindings - JavaScript, Go, Rust implementations

Development Setup

# Fork and clone
git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream

# Create branch
git checkout -b feature/your-feature

# Install dev dependencies
pip install -e ".[dev]"

# Make changes and test
pytest tests/

# Submit PR

📄 License

MIT License - see LICENSE file

🙏 Acknowledgments

Inspired by CSV efficiency for tabular data
Built for the LLM era where tokens = money
Tested with real-world production datasets

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: See PICKLE_USAGE.md and results/OPTIMIZATION_GUIDE.md

🔗 Links

PyPI: https://pypi.org/project/toonstream/
GitHub: https://github.com/vivekpandian08/toonstream
Repository: https://github.com/vivekpandian08/toonstream
Issues: https://github.com/vivekpandian08/toonstream/issues

Made with ❤️ for the LLM community

Save tokens. Save money. Build better.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0

Dec 9, 2025

1.1.0

Dec 6, 2025

This version

1.0.1

Nov 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toonstream-1.0.1.tar.gz (57.2 kB view details)

Uploaded Nov 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

toonstream-1.0.1-py3-none-any.whl (20.3 kB view details)

Uploaded Nov 27, 2025 Python 3

File details

Details for the file toonstream-1.0.1.tar.gz.

File metadata

Download URL: toonstream-1.0.1.tar.gz
Upload date: Nov 27, 2025
Size: 57.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for toonstream-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`23d709b8ff56ff8172a9e9b77e9d50bda943120e70148a76819db0e01ed95431`
MD5	`29049cc1031f1b93beee4d10935074ba`
BLAKE2b-256	`be3d35d93fe6a656b7933969641f1aa321617d1ad817e5eeeea792b3698e0e99`

See more details on using hashes here.

File details

Details for the file toonstream-1.0.1-py3-none-any.whl.

File metadata

Download URL: toonstream-1.0.1-py3-none-any.whl
Upload date: Nov 27, 2025
Size: 20.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for toonstream-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f06d6c3adc95a43c4cb5e9ac757ecec52de0fbab8a0a67ce916463ca6c15bfc`
MD5	`f904242af1af30e6a5a83b7d8e342cdc`
BLAKE2b-256	`1c50590e33fb2c6889386097013db8e95c48b725b612ef8c12d06dbe70e33bee`

See more details on using hashes here.

toonstream 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎨 ToonStream

📖 What is ToonStream?

The Problem

The Solution

Why ToonStream?

🚀 Installation

Basic Usage

⚡ Quick Start

Basic Usage

Advanced Options

📊 Performance Benchmarks

When to Use TOON

Speed

🎯 Use Cases

1. LLM Context Optimization

Verify Installation

Project Structure

1. LLM Context Optimization

2. Pickle Integration

3. API Response Optimization

4. Configuration Files

🛠️ API Reference

Core Functions

encode(obj, compact=False, smart_optimize=True, indent=None, sort_keys=False)

decode(toon_str, strict=True)

Pickle Functions

save_toon_pickle(data, filepath, smart_optimize=True, protocol=HIGHEST_PROTOCOL)

load_toon_pickle(filepath, strict=True)

Exceptions

🧪 Development

Running Tests

Running Benchmarks

Project Structure

📖 Examples

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

🔗 Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`encode(obj, compact=False, smart_optimize=True, indent=None, sort_keys=False)`

`decode(toon_str, strict=True)`

`save_toon_pickle(data, filepath, smart_optimize=True, protocol=HIGHEST_PROTOCOL)`

`load_toon_pickle(filepath, strict=True)`