Token Oriented Object Notation - Efficient data serialization for LLMs
Project description
๐จ ToonStream
Token-Oriented Object Notation (TOON) - Reduce LLM token usage by up to 55% with lossless data serialization
๐ What is ToonStream?
ToonStream is a Python library for encoding structured data in a token-efficient format designed for Large Language Models (LLMs). It converts repetitive JSON structures into compact, tabular representations that dramatically reduce token count while maintaining 100% lossless conversion.
The Problem
LLMs charge by tokens. Verbose JSON wastes tokens and money:
[
{"id": 1, "name": "Alice", "dept": "Engineering", "salary": 95000},
{"id": 2, "name": "Bob", "dept": "Sales", "salary": 75000},
{"id": 3, "name": "Carol", "dept": "Engineering", "salary": 105000}
]
Cost: 80 tokens
The Solution
TOON format eliminates redundancy:
employees[3]{id,name,dept,salary}:
1,Alice,Engineering,95000
2,Bob,Sales,75000
3,Carol,Engineering,105000
Cost: 38 tokens (-52.5% reduction)
Why ToonStream?
โ
Save Money - Reduce API costs by up to 55% on structured data
โ
100% Lossless - Perfect round-trip conversion, no data loss
โ
Zero Dependencies - Pure Python, no external packages required
โ
Fast - Sub-millisecond encoding/decoding
โ
Smart - Automatic optimization, only improves when beneficial
โ
Simple API - Two functions: encode() and decode()
๐ Installation
pip install toonstream
Or from source:
git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream
pip install -e .
Requirements:
- Python 3.8 or higher
- No external dependencies (tiktoken optional for benchmarks)
Basic Usage
import toonstream
# Your data
data = {
"name": "Alice",
"age": 30,
"skills": ["Python", "JavaScript", "SQL"]
}
# Encode to TOON
toon_str = toonstream.encode(data)
print(toon_str)
Output:
name: "Alice"
age: 30
skills: [
- "Python"
- "JavaScript"
- "SQL"
]
โก Quick Start
Basic Usage
from toonstream import encode, decode
# Your data
data = {
"employees": [
{"id": 1, "name": "Alice", "dept": "Engineering"},
{"id": 2, "name": "Bob", "dept": "Sales"},
{"id": 3, "name": "Carol", "dept": "Engineering"}
]
}
# Encode to TOON format
toon_str = encode(data)
print(toon_str)
# Output:
# employees[3]{id,name,dept}:
# 1,Alice,Engineering
# 2,Bob,Sales
# 3,Carol,Engineering
# Decode back to Python
decoded = decode(toon_str)
assert decoded == data # โ Perfect round-trip!
Advanced Options
# Compact mode (minimize whitespace)
compact = encode(data, compact=True)
# Disable smart optimization (always use tabular)
always_tabular = encode(data, smart_optimize=False)
# Pretty print with indentation
pretty = encode(data, indent=2)
๐ Performance Benchmarks
Real-world results from production datasets:
| Data Type | JSON Tokens | TOON Tokens | Reduction | Use Case |
|---|---|---|---|---|
| Employee Records (50) | 3,914 | 1,733 | -55.7% | HR systems, payroll |
| GitHub Repos (100) | 14,102 | 8,712 | -38.2% | API responses |
| Order History (10) | 2,926 | 2,915 | -0.4% | E-commerce |
| Config Files (20) | 7,393 | 7,393 | 0.0% | Microservices |
When to Use TOON
๐ข Excellent Results (30-55% savings):
- Arrays of similar objects (users, products, logs)
- Tabular data (CSV-like structures)
- Database query results
- Time-series data
๐ก Good Results (10-30% savings):
- Mixed nested structures
- API responses with arrays
- Semi-structured documents
๐ด Neutral Results (ยฑ5%):
- Deeply nested JSON (5+ levels)
- Unique object structures
- Small datasets (<3 items)
Speed
All operations complete in under 1 millisecond for typical datasets:
- 50 records: 0.41ms
- 100 records: 0.83ms
- Decode: <1ms
๐ฏ Use Cases
1. LLM Context Optimization
- Install in development mode:
pip install -e .
- Install development dependencies (optional):
pip install -e ".[dev]"
This includes:
pytest- Testing frameworkpytest-cov- Coverage reportingtiktoken- Token countingblack- Code formatting
Verify Installation
# Run tests
pytest tests/test_toonstream.py
# Run benchmarks
python benchmarks/run_all_comparisons.py
# Try the tutorial
jupyter notebook examples/toonstream_tutorial.ipynb
Project Structure
toonstream/
โโโ toonstream/ # Core library
โ โโโ __init__.py # Public API exports
โ โโโ encoder.py # TOON encoder (485 lines)
โ โโโ decoder.py # TOON decoder (533 lines)
โ โโโ exceptions.py # Exception hierarchy (60 lines)
โ โโโ pickle_utils.py # Pickle integration (177 lines)
โโโ benchmarks/ # Performance tests
โโโ tests/ # Test suite (51 tests, 100% passing)
โโโ examples/ # Usage examples
โโโ data/ # Benchmark datasets
โโโ results/ # Benchmark results
โโโ README.md # This file
โโโ PICKLE_USAGE.md # Pickle utilities guide
โโโ pyproject.toml # Modern package configuration
โโโ setup.py # Package configuration
โโโ requirements.txt # Dependencies
1. LLM Context Optimization
import toonstream
# Pass structured data to LLM
context = {
"users": [...], # 100 user records
"products": [...], # 50 products
"orders": [...] # 200 orders
}
# Reduce prompt tokens by 40%
toon_context = toonstream.encode(context)
response = llm.complete(f"Analyze this data:\n{toon_context}")
2. Pickle Integration
Save data with TOON encoding for additional compression:
from toonstream import save_toon_pickle, load_toon_pickle
# Save with TOON encoding
data = {"users": [...], "logs": [...]}
save_toon_pickle(data, 'data.toon.pkl')
# Load back
loaded = load_toon_pickle('data.toon.pkl')
# 11.4% smaller than regular pickle!
3. API Response Optimization
from toonstream import encode
from flask import Flask, Response
app = Flask(__name__)
@app.route('/api/employees')
def get_employees():
employees = db.query("SELECT * FROM employees")
toon_data = encode(employees)
return Response(toon_data, mimetype='text/plain')
# Clients get 55% smaller responses
4. Configuration Files
import toonstream
config = {
"database": {"host": "localhost", "port": 5432},
"cache": {"ttl": 3600, "max_size": 1000}
}
# Save human-readable config
with open('config.toon', 'w') as f:
f.write(toonstream.encode(config, indent=2))
# Load config
with open('config.toon') as f:
config = toonstream.decode(f.read())
๐ ๏ธ API Reference
Core Functions
encode(obj, compact=False, smart_optimize=True, indent=None, sort_keys=False)
Convert Python object to TOON format.
Parameters:
obj(Any): Python object (dict, list, primitive)compact(bool): Minimize whitespace (default: False)smart_optimize(bool): Auto-detect best format (default: True)indent(int): Indentation spaces, None for compact (default: None)sort_keys(bool): Sort dictionary keys alphabetically (default: False)
Returns: str - TOON formatted string
Raises: ToonEncodeError - If encoding fails
# Basic encoding
toon = encode(data)
# Compact output
toon = encode(data, compact=True)
# Sort dictionary keys
toon = encode(data, sort_keys=True)
# Always use tabular (no optimization)
toon = encode(data, smart_optimize=False)
# Pretty print with 2-space indent
toon = encode(data, indent=2)
decode(toon_str, strict=True)
Convert TOON format to Python object.
Parameters:
toon_str(str): TOON formatted stringstrict(bool): Enforce strict validation (default: True)
Returns: Any - Python object
Raises: ToonDecodeError - If decoding fails
# Decode TOON string
data = decode(toon_str)
# Lenient mode (allows minor format issues)
data = decode(toon_str, strict=False)
Pickle Functions
save_toon_pickle(data, filepath, smart_optimize=True, protocol=HIGHEST_PROTOCOL)
Save data as TOON-encoded pickle file.
Parameters:
data(Any): Python object to savefilepath(str): Output file pathsmart_optimize(bool): Use TOON optimization (default: True)protocol(int): Pickle protocol version (default: HIGHEST_PROTOCOL)
from toonstream import save_toon_pickle
save_toon_pickle(data, 'data.toon.pkl')
load_toon_pickle(filepath, strict=True)
Load TOON-encoded pickle file.
Parameters:
filepath(str): Input file pathstrict(bool): Enforce strict TOON validation (default: True)
Returns: Any - Loaded Python object
from toonstream import load_toon_pickle
data = load_toon_pickle('data.toon.pkl')
Exceptions
ToonError- Base exceptionToonEncodeError- Encoding failures (unsupported types, NaN, Infinity)ToonDecodeError- Decoding failures (invalid format, syntax errors)ToonValidationError- Validation failuresToonPickleError- Pickle operation failures
๐งช Development
Running Tests
# Install dev dependencies
pip install -e ".[dev]"
# Run all tests
pytest tests/test_toonstream.py -v
# Run with coverage
pytest tests/ --cov=toonstream --cov-report=html
# Open coverage report
open htmlcov/index.html
Running Benchmarks
# Run all benchmarks
python benchmarks/run_all_comparisons.py
# Results appear in terminal and save to results/
Project Structure
toonstream/
โโโ toonstream/ # Core library
โ โโโ __init__.py # Public API exports
โ โโโ encoder.py # TOON encoder (485 lines)
โ โโโ decoder.py # TOON decoder (533 lines)
โ โโโ exceptions.py # Exception hierarchy (60 lines)
โ โโโ pickle_utils.py # Pickle integration (177 lines)
โโโ benchmarks/ # Performance tests
โ โโโ run_all_comparisons.py
โ โโโ config.json
โโโ tests/ # Test suite (51 tests, 100% passing)
โ โโโ test_toonstream.py
โโโ examples/ # Usage examples
โ โโโ basic_example.py
โ โโโ advanced_example.py
โ โโโ pickle_example.py
โ โโโ toonstream_tutorial.ipynb
โโโ data/ # Benchmark datasets
โโโ results/ # Benchmark results
โโโ README.md # This file
โโโ PICKLE_USAGE.md # Pickle utilities guide
โโโ pyproject.toml # Modern package configuration
โโโ setup.py # Package configuration
โโโ requirements.txt # Dependencies
๐ Examples
See the examples/ directory for complete examples:
- basic_example.py - Getting started guide
- advanced_example.py - Smart optimization features
- pickle_example.py - Pickle integration demo
- toonstream_tutorial.ipynb - Interactive Jupyter notebook tutorial
Run them:
python examples/basic_example.py
python examples/advanced_example.py
python examples/pickle_example.py
๐ค Contributing
Contributions welcome! Areas for improvement:
- Additional Features - CLI tool, streaming encoder, additional format options
- Performance - C extension for faster encoding/decoding
- Documentation - More examples, integration guides
- Language Bindings - JavaScript, Go, Rust implementations
Development Setup
# Fork and clone
git clone https://github.com/vivekpandian08/toonstream.git
cd toonstream
# Create branch
git checkout -b feature/your-feature
# Install dev dependencies
pip install -e ".[dev]"
# Make changes and test
pytest tests/
# Submit PR
๐ License
MIT License - see LICENSE file
๐ Acknowledgments
- Inspired by CSV efficiency for tabular data
- Built for the LLM era where tokens = money
- Tested with real-world production datasets
๐ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See
PICKLE_USAGE.mdandresults/OPTIMIZATION_GUIDE.md
๐ Links
- PyPI: https://pypi.org/project/toonstream/
- GitHub: https://github.com/vivekpandian08/toonstream
- Repository: https://github.com/vivekpandian08/toonstream
- Issues: https://github.com/vivekpandian08/toonstream/issues
Made with โค๏ธ for the LLM community
Save tokens. Save money. Build better.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toonstream-1.0.1.tar.gz.
File metadata
- Download URL: toonstream-1.0.1.tar.gz
- Upload date:
- Size: 57.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23d709b8ff56ff8172a9e9b77e9d50bda943120e70148a76819db0e01ed95431
|
|
| MD5 |
29049cc1031f1b93beee4d10935074ba
|
|
| BLAKE2b-256 |
be3d35d93fe6a656b7933969641f1aa321617d1ad817e5eeeea792b3698e0e99
|
File details
Details for the file toonstream-1.0.1-py3-none-any.whl.
File metadata
- Download URL: toonstream-1.0.1-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f06d6c3adc95a43c4cb5e9ac757ecec52de0fbab8a0a67ce916463ca6c15bfc
|
|
| MD5 |
f904242af1af30e6a5a83b7d8e342cdc
|
|
| BLAKE2b-256 |
1c50590e33fb2c6889386097013db8e95c48b725b612ef8c12d06dbe70e33bee
|