Skip to main content

A Python library for working with TOON format, a compact and human readable serialization format optimized for LLM contexts.

Project description

toon_serializer: Token Oriented Object Notation for Python

toon_serializer is a high performance Python serializer/deserializer for TOON (Token Oriented Object Notation).

TOON is a human readable data format designed to minimize token usage for LLMs by removing redundant syntax (braces, quotes, repeated keys) while maintaining structure. It excels at compressing list of dictionaries into Tabular Arrays, often reducing payload sizes by 30-50% compared to JSON.

Features

  • 📉 Token Efficient Replaces repetitive JSON keys with compact CSV-like headers.
  • 🧠 Adaptive Schema The decoder "learns" column types from the first row to parse massive tables instantly.
  • ⚡ Fast Primitives Optimized integer, float, and boolean parsing.
  • csv-compatible Handles complex string escaping and quoting automatically.
  • Lazy Decoding Iterates over lines lazily, efficient for large datasets.

Installation

pip install toon_serializer

Usage

toon_serializery mimics the standard Python json API with loads and dumps.

  1. Encoding Data (Serialization)

toon_serializer automatically detects Uniform Lists of Dictionaries and compresses them into a tabular format.

Input

import toon_serializer

data = {
    "model": "gpt-4",
    "parameters": {
        "temperature": 0.7,
        "stream": True
    },
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum physics."},
        {"role": "assistant", "content": "Quantum physics studies..."}
    ]
}

toon_str = toon_serializer.dumps(data)
print(toon_str)

Output:

model: gpt-4
parameters:
  temperature: 0.7
  stream: true
messages[3]{role,content}:
  system,"You are a helpful assistant."
  user,"Explain quantum physics."
  assistant,"Quantum physics studies..."
  1. Decoding Data (Deserialization)

The decoder handles primitives, standard lists, and adaptive tabular arrays seamlessly.

import toon_serializer

toon_str = """
version: 1.0
users[2]{id,name,is_active}:
  1,Alice,true
  2,Bob,false
tags[3]: python, rust, go
"""

data = toon_serializer.loads(toon_str)

print(data["users"][0]) 
# {'id': 1, 'name': 'Alice', 'is_active': True}

print(data["tags"])
# ['python', 'rust', 'go']

Performance Notes

  • Encoder recursively checks for "uniformity" in lists. If a list contains mixed types, it gracefully falls back to a standard bulleted list.
  • Decoder uses a Pushback Iterator to parse line-by-line without loading the entire string into memory.
  • Adaptive Parsing when decoding tables, toon_serializer inspects the first row to generate a specialized converter function (e.g., "Column 1 is int, Column 2 is string"), speeding up parsing for the remaining rows.

Contributing

  1. Fork the repository.
  2. Create a feature branch.
  3. Add tests.
  4. Submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toon_serializer-1.0.0.tar.gz (40.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toon_serializer-1.0.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file toon_serializer-1.0.0.tar.gz.

File metadata

  • Download URL: toon_serializer-1.0.0.tar.gz
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.7

File hashes

Hashes for toon_serializer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8928a25f618558fa58a6651de9b0438257bf4568ca3c5d0967684be61505069b
MD5 747b6851366edca055e8dd338f0e6834
BLAKE2b-256 194b3ad0e56886fa207ac0d12c1ff9e1b6e43d364302b6dea8ed7c5c2b3334e3

See more details on using hashes here.

File details

Details for the file toon_serializer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for toon_serializer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7e27e7f6b766758472ce4b8a019fa3ea6dc6277e8fbf9ea0a00ede76ee502af3
MD5 fd882ad8eabafa162d34d4b018d6e814
BLAKE2b-256 139be643daf0ae261ac0b445b5f98c6afc3d29c5efba4fb56654f9db7240ad93

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page