Token-Optimized Object Notation (TOON) converter for efficient LLM data serialization

These details have not been verified by PyPI

Project links

Project description

TOON Converter

LOGO

Token-Optimized Object Notation (TOON) v2.0 - The most comprehensive Python library for TOON format, featuring 100% spec compliance, 10 framework integrations, and production-ready tools for reducing LLM token usage by 30-60%.

💡 Why Use TOON Converter?

Real Benefits for Your LLM Applications

Benefit	Impact	Example
Faster Processing	Smaller payloads = faster responses	200ms → 80ms average latency
Better Context	More data in same token limit	Fit 10 docs instead of 6 in context
Works Everywhere	10 framework integrations	LangChain, Pandas, FastAPI, SQLAlchemy, MCP
Easy to Use	2 lines of code to get started	`import toonverter as toon; toon.encode(data)`
Production Ready	Battle-tested, type-safe	563 tests, 81% coverage
Smart Optimization	Auto-detects tabular data	Arrays → compact table format
Format Flexibility	Convert between 6 formats	JSON, YAML, TOML, CSV, XML, TOON
Built-in Analytics	Compare formats instantly	See token savings before you commit
Zero Config	Works out of the box	No setup, no config files needed

🚀 Key Features

Core Capabilities

100% TOON v2.0 Spec Compliant: All 26 specification tests passing
30-60% Token Savings: Verified with benchmarks on real-world data
Multi-Format Support: JSON, YAML, TOML, CSV, XML ↔ TOON
Vision Optimization: Reduce image token costs for multimodal models
Semantic Deduplication: Remove semantically identical content using embeddings
Tabular Optimization: Exceptional efficiency for DataFrame-like structures
Token Analysis: Compare token usage across formats using tiktoken
Type Inference: Automatic type detection and preservation
Strict Validation: Optional strict mode for production safety

Framework Integrations (10)

Pandas: DataFrame ↔ TOON with tabular optimization
Pydantic: BaseModel serialization with validation
LangChain: Document and Message support for RAG systems
FastAPI: Native TOON response class
🗄SQLAlchemy: ORM model serialization and bulk operations
MCP: Model Context Protocol server with 4 tools
LlamaIndex: Node and Document support
Haystack: Document integration for pipelines
DSPy: Example and prediction support
Instructor: Response model integration

Installation

Basic Installation

pip install toonverter

Includes TOON encoding/decoding, JSON/YAML/TOML/CSV/XML support, and token analysis.

Individual Framework Integrations

# Data science
pip install toonverter[pandas]      # DataFrame support
pip install toonverter[sqlalchemy]  # ORM serialization

# AI/LLM frameworks
pip install toonverter[langchain]   # Document/Message support
pip install toonverter[llamaindex]  # Node support
pip install toonverter[haystack]    # Pipeline integration
pip install toonverter[dspy]        # Example support
pip install toonverter[instructor]  # Response models
pip install toonverter[redis]       # Redis JSON/Hash support

# Web frameworks
pip install toonverter[fastapi]     # TOONResponse class
pip install toonverter[pydantic]    # BaseModel serialization

# Model Context Protocol
pip install toonverter[mcp]         # MCP server with 4 tools

Grouped Integrations

pip install toonverter[ai]    # LlamaIndex, Haystack, DSPy, Instructor
pip install toonverter[data]  # Pandas, SQLAlchemy
pip install toonverter[web]   # FastAPI, Pydantic
pip install toonverter[llm]   # LangChain, MCP

CLI Tools

pip install toonverter[cli]  # Command-line interface with rich output

Complete Installation

pip install toonverter[all]  # All integrations + CLI

Development Installation

git clone https://github.com/yourusername/toonverter.git
cd toonverter
pip install -e ".[all]"
make install-dev  # Install dev dependencies

Quick Start

Simple Facade API (90% of users)

import toonverter as toon

# Convert JSON to TOON
data = {"name": "Alice", "age": 30, "city": "NYC"}
toon_str = toon.encode(data)
print(toon_str)
# Output: {name:Alice,age:30,city:NYC}

# Convert TOON back to Python dict
decoded = toon.decode(toon_str)
print(decoded)
# Output: {'name': 'Alice', 'age': 30, 'city': 'NYC'}

# Convert between formats
toon.convert(source='data.json', target='data.toon', from_format='json', to_format='toon')

# Analyze token usage
report = toon.analyze(data, compare_formats=['json', 'toon'])
print(f"Best format: {report.best_format}")
print(f"Token savings: {report.max_savings_percentage:.1f}%")
# Output: Best format: toon, Token savings: 33.3%

# Load and save files
data = toon.load('config.json', format='json')
toon.save(data, 'config.toon', format='toon')

# Check supported formats
print(toon.list_formats())
# Output: ['csv', 'json', 'toml', 'toon', 'xml']

Object-Oriented API (Power Users)

from toonverter import Converter, Encoder, Decoder, Analyzer

# Stateful converter with custom options
converter = Converter(
    from_format='json',
    to_format='toon',
    compact=True,
    sort_keys=True
)
result = converter.convert_file('data.json', 'data.toon')

# Custom encoder configuration
encoder = Encoder(
    format='toon',
    delimiter=',',
    compact=True
)
encoded = encoder.encode(data)

# Token analyzer with specific model
analyzer = Analyzer(model='gpt-4')
report = analyzer.analyze_multi_format(data, formats=['json', 'toon'])
print(report.max_savings_percentage)

Advanced Capabilities

Vision Optimization

Reduce image token costs for multimodal models (GPT-4o, Claude 3.5).

from toonverter import optimize_vision

# Optimize image (resize, format, quality) for provider
img_bytes, mime = optimize_vision(
    raw_image_bytes, 
    provider="openai"  # or 'anthropic'
)

Semantic Deduplication

Remove semantically identical items from lists using embeddings.

from toonverter import deduplicate

# Remove duplicates based on meaning (threshold 0.0-1.0)
clean_data = deduplicate(data, threshold=0.9)

Schema Tools

Infer structure from data and validate new instances.

from toonverter import infer_schema, validate_schema

# Learn schema from existing data
schema = infer_schema(data)

# Validate new data against schema
errors = validate_schema(new_data, schema)
if not errors:
    print("Valid!")

Structural Diff

Compare complex objects to find semantic differences.

from toonverter import diff

# Get structural differences
result = diff(old_ver, new_ver)
print(f"Found {len(result.changes)} changes")

Smart Compression

Apply Smart Dictionary Compression (SDC) for maximum efficiency.

from toonverter import compress, decompress

# Compress large dataset
compressed = compress(large_data)

# Restore original
original = decompress(compressed)

Context Optimization

Intelligently prune, truncate, or round data to fit within a strict token budget.

from toonverter.optimization import ContextOptimizer

# Optimize data to fit in 1000 tokens
optimizer = ContextOptimizer(budget=1000)
optimized_data = optimizer.optimize(large_data)

Integration Examples

Pandas DataFrame

import pandas as pd
from toonverter.integrations import pandas_to_toon, toon_to_pandas

# Convert DataFrame to TOON (optimized for tabular data)
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [30, 25, 35],
    'city': ['NYC', 'LA', 'SF']
})

toon_str = pandas_to_toon(df)
print(toon_str)
# Output:
# name,age,city
# Alice,30,NYC
# Bob,25,LA
# Charlie,35,SF

# Convert back to DataFrame
restored_df = toon_to_pandas(toon_str)

Pydantic Models

from pydantic import BaseModel
from toonverter.integrations import pydantic_to_toon, toon_to_pydantic

class User(BaseModel):
    name: str
    age: int
    email: str

user = User(name="Alice", age=30, email="alice@example.com")

# Serialize to TOON
toon_str = pydantic_to_toon(user)

# Deserialize from TOON
restored_user = toon_to_pydantic(toon_str, User)

LangChain RAG

from langchain_core.documents import Document
from langchain_core.messages import HumanMessage
from toonverter.integrations import langchain_to_toon, toon_to_langchain, messages_to_toon

# Convert LangChain documents to TOON (supports lists)
docs = [
    Document(page_content="Info 1...", metadata={"id": 1}),
    Document(page_content="Info 2...", metadata={"id": 2})
]
toon_str = langchain_to_toon(docs)

# Convert Chat Messages
messages = [HumanMessage(content="Hello")]
toon_msgs = messages_to_toon(messages)

# Restore document
restored_docs = toon_to_langchain(toon_str)

FastAPI

from fastapi import FastAPI
from toonverter.integrations import TOONResponse

app = FastAPI()

@app.get("/data", response_class=TOONResponse)
async def get_data():
    return {"users": [...], "count": 100}
    # Automatically serialized as TOON with proper content-type

SQLAlchemy ORM

from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from toonverter.integrations import sqlalchemy_to_toon, toon_to_sqlalchemy

class Base(DeclarativeBase):
    pass

class Product(Base):
    __tablename__ = 'products'
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str]
    price: Mapped[float]

# Serialize ORM instances
product = Product(id=1, name="Widget", price=29.99)
toon_str = sqlalchemy_to_toon(product)

# Bulk operations with query results
products = session.query(Product).all()
toon_str = sqlalchemy_to_toon(products)  # Optimized tabular format

LlamaIndex RAG

from llama_index.core.schema import Document, TextNode
from toonverter.integrations import llamaindex_to_toon, toon_to_llamaindex

# Convert LlamaIndex nodes for efficient storage
node = TextNode(
    text="Important context...",
    metadata={"source": "doc.pdf"}
)

toon_str = llamaindex_to_toon(node)
restored_node = toon_to_llamaindex(toon_str)

Haystack Pipelines

from haystack.dataclasses import Document
from toonverter.integrations import haystack_to_toon, toon_to_haystack

# Optimize Haystack documents
doc = Document(
    content="Search content...",
    meta={"title": "Article", "date": "2025-01-15"}
)

toon_str = haystack_to_toon(doc)
restored_doc = toon_to_haystack(toon_str)

DSPy Examples

from dspy import Example
from toonverter.integrations import dspy_to_toon, toon_to_dspy

# Serialize DSPy training examples
example = Example(
    question="What is TOON?",
    answer="A token-optimized format"
).with_inputs("question")

toon_str = dspy_to_toon(example)
restored = toon_to_dspy(toon_str)

Instructor Responses

from pydantic import BaseModel
from toonverter.integrations import to_toon_response, from_toon_response

class UserResponse(BaseModel):
    name: str
    age: int
    email: str

# Convert Instructor-structured responses
response = UserResponse(name="Alice", age=30, email="alice@example.com")
toon_str = to_toon_response(response)

Redis Integration

import redis
from toonverter.integrations import RedisToonWrapper

# Initialize Redis wrapper
r = redis.Redis(decode_responses=True)
wrapper = RedisToonWrapper(r)

# Retrieve JSON document as TOON (optimized)
# Automatically converts Redis JSON to TOON format
toon_str = wrapper.get_json("user:1001")

# Retrieve multiple documents (tabular optimization)
# Great for RAG retrieval - returns compact table
docs = wrapper.mget_json(["doc:1", "doc:2", "doc:3"])

Model Context Protocol (MCP)

# Use as MCP server for Claude Desktop or other MCP clients
# Add to claude_desktop_config.json:
{
  "mcpServers": {
    "toonverter": {
      "command": "python",
      "args": ["-m", "toonverter.integrations.mcp_server"]
    }
  }
}

# Available MCP tools:
# - convert: Convert between formats
# - analyze: Analyze token usage
# - validate: Validate TOON syntax
# - compress: Find most efficient format

CLI Usage

# Convert files
toonverter convert data.json data.toon --from json --to toon

# Encode to TOON
toonverter encode data.json --output data.toon

# Decode from TOON
toonverter decode data.toon --output data.json --format json

# Analyze token usage
toonverter analyze data.json --compare json toon

# List supported formats
toonverter formats

TOON Format Specification v2.0

TOON (Token-Optimized Object Notation) is designed for maximum token efficiency while maintaining readability.

Three Root Forms

# 1. Object (default) - key-value pairs
name: Alice
age: 30
city: NYC

# 2. Array - collection of items
users[3]:
  - Alice
  - Bob
  - Charlie

# 3. Primitive - single value
Hello World

Three Array Forms

# 1. Inline Array - primitives on one line
tags[3]: python,llm,optimization

# 2. Tabular Array - uniform objects with primitives only
users[3]{name,age,city}:
  Alice,30,NYC
  Bob,25,LA
  Charlie,35,SF

# 3. List Array - complex/mixed structures
items[2]:
  - name: Item1
    price: 19.99
    tags[2]: sale,new
  - name: Item2
    price: 29.99
    nested:
      key: value

String Quoting Rules

Strings need quotes if they:

Are empty or whitespace-only
Start/end with whitespace
Match reserved words (true, false, null)
Look numeric (123, 3.14, -42)
Contain special chars (:[]{}|,)
Start with hyphen (-test)
Contain the delimiter

# Quoted strings
name: "true"           # Reserved word
id: "123"              # Looks numeric
path: "test:value"     # Contains colon
text: "  spaced  "     # Has whitespace
empty: ""              # Empty string

# Unquoted strings
simple: hello
snake_case: user_name
kebab-case: test-value

Number Canonical Form

# Valid numbers
count: 42
price: 19.99
negative: -3.14
zero: 0

# Normalized (not allowed in strict mode)
# 1.0 → 1
# 1e5 → 100000
# -0 → 0
# NaN → null
# Infinity → null

Delimiters

# Comma (default, no marker)
a: 1,b: 2,c: 3

# Tab (marked with {TAB})
{TAB}
a: 1\tb: 2\tc: 3

# Pipe (marked with {PIPE})
{PIPE}
a: 1|b: 2|c: 3

Escape Sequences

Only 5 escape sequences are allowed:

\\ - Backslash
\" - Double quote
\n - Newline
\r - Carriage return
\t - Tab

Token Savings Examples

Format	Tokens	Savings
JSON	24	0%
YAML	20	16%
TOON	16	33%

Actual savings vary by data structure. Tabular data sees 40-60% savings.

For full specification details, see TOON v2.0 Spec.

Advanced Features

Custom Format Adapters

from toonverter.core.interfaces import FormatAdapter
from toonverter.core.registry import registry

class CustomAdapter(FormatAdapter):
    def encode(self, data, options):
        # Custom encoding logic
        pass

    def decode(self, data_str, options):
        # Custom decoding logic
        pass

# Register adapter
registry.register('custom', CustomAdapter())

# Use it
import toonverter as toon
toon.convert(source='data.custom', target='data.toon', from_format='custom', to_format='toon')

Plugin Development

# my_plugin.py
from toonverter.plugins import Plugin

class MyFormatPlugin(Plugin):
    name = "myformat"
    version = "1.0.0"

    def register(self, registry):
        registry.register('myformat', MyFormatAdapter())

# setup.py entry point
entry_points={
    'toonverter.plugins': [
        'myformat = my_plugin:MyFormatPlugin',
    ]
}

Use Cases

RAG Systems: Reduce vector database storage and improve retrieval
LLM Prompts: Minimize token usage in context windows
API Responses: Efficient data transfer with FastAPI integration
Data Pipelines: Convert between formats in ETL workflows
Configuration Files: Token-efficient config serialization
LangChain Applications: Optimize document storage and retrieval

Requirements

Python 3.10+
Core: typing-extensions>=4.8.0, tiktoken>=0.5.0, PyYAML>=6.0, tomli>=2.0.0 (Python <3.11)
Optional integrations: pandas, pydantic, langchain, fastapi, sqlalchemy, mcp, llama-index, haystack-ai, dspy-ai, instructor
Optional CLI: click, rich

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Key areas for contribution:

Additional format adapters
Performance optimizations
Documentation improvements
Bug fixes and features
Integration examples

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Inspired by the need for efficient data serialization in LLM applications
Built with modern Python packaging standards (PEP 517/518)
Follows SOLID principles and clean architecture
Designed for the LLM and AI community

Quick Links

Install: pip install toonverter
Import: import toonverter as toon
CLI: toonverter --help
Test: python3 -m pytest tests/
Examples: See examples/ directory

Package Name: toonverter | CLI Command: toonverter | Import: import toonverter

TheDataMigrationCompany@2025 BWI@2025

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Nov 26, 2025

1.1.2

Nov 18, 2025

1.0.5

Nov 17, 2025

1.0.4

Nov 17, 2025

1.0.0

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toonverter-1.2.0.tar.gz (111.0 kB view details)

Uploaded Nov 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

toonverter-1.2.0-py3-none-any.whl (130.1 kB view details)

Uploaded Nov 26, 2025 Python 3

File details

Details for the file toonverter-1.2.0.tar.gz.

File metadata

Download URL: toonverter-1.2.0.tar.gz
Upload date: Nov 26, 2025
Size: 111.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for toonverter-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3d3908ff332ede2bef615dc6db7fe293ee64f7f46a388bb4fbeba0e2c33ec2b0`
MD5	`42a31ec53d5ed96d213607574d50d38f`
BLAKE2b-256	`3b67cfbefb6a771d99fa694aab50f233ec108149e41e346bf36bb2c150a5607b`

See more details on using hashes here.

File details

Details for the file toonverter-1.2.0-py3-none-any.whl.

File metadata

Download URL: toonverter-1.2.0-py3-none-any.whl
Upload date: Nov 26, 2025
Size: 130.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for toonverter-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`39e39cf9feabee12000e9511735ba55e2b2e6498bd262c31db8a445c13796c2d`
MD5	`2e78112cc0e66e87ae4f304cece1edc3`
BLAKE2b-256	`6e160a506f3ed2931bdced33d5984da25da0c825101d8a12148c643113cea337`

See more details on using hashes here.

toonverter 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TOON Converter

💡 Why Use TOON Converter?

Real Benefits for Your LLM Applications

🚀 Key Features

Core Capabilities

Framework Integrations (10)

Installation

Basic Installation

Individual Framework Integrations

Grouped Integrations

CLI Tools

Complete Installation

Development Installation

Quick Start

Simple Facade API (90% of users)

Object-Oriented API (Power Users)

Advanced Capabilities

Vision Optimization

Semantic Deduplication

Schema Tools

Structural Diff

Smart Compression

Context Optimization

Integration Examples

Pandas DataFrame

Pydantic Models

LangChain RAG

FastAPI

SQLAlchemy ORM

LlamaIndex RAG

Haystack Pipelines

DSPy Examples

Instructor Responses

Redis Integration

Model Context Protocol (MCP)

CLI Usage

TOON Format Specification v2.0

Three Root Forms

Three Array Forms

String Quoting Rules

Number Canonical Form

Delimiters

Escape Sequences

Token Savings Examples

Advanced Features

Custom Format Adapters

Plugin Development

Use Cases

Requirements

Contributing

License

Acknowledgments

Quick Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes