Skip to main content

Deep Token-Oriented Object Notation - Efficient JSON compression for LLM applications

Project description

Deep-TOON: Deep Token-Oriented Object Notation

Deep-TOON is a token-optimized JSON representation format designed for LLMs and AI applications. It provides significant compression for nested JSON structures while maintaining perfect data fidelity and LLM readability.

📊 Performance Overview

Test Data: dummyjson.com/users (3 users)

Original JSON:    1,675 tokens
Deep-TOON:       1,065 tokens (36.4% reduction)

Comprehensive Test Results:

  • Average reduction: 28.7% across diverse data types
  • Best case: 61.0% reduction on large structured datasets
  • Success rate: 92.9% perfect roundtrip fidelity

🏗️ Format Specification

Basic Structure

[N,delimiter]{schema}:
  value1,value2,value3
  value4,value5,value6

Hierarchical Tuples

Deep-TOON uses explicit hierarchical notation to group related fields:

# Nested objects become tuples
address{street,city,coordinates{lat,lng}}

# Results in data like:
("626 Main Street", "Phoenix", (-77.16, -92.08))

Complete Example

Original JSON:

{
  "users": [
    {
      "id": 1,
      "firstName": "Emily", 
      "lastName": "Johnson",
      "age": 28,
      "address": {
        "address": "626 Main Street",
        "city": "Phoenix", 
        "state": "Mississippi",
        "coordinates": {"lat": -77.16213, "lng": -92.084824}
      },
      "bank": {
        "cardNumber": "9289760655481815",
        "cardType": "Elo"
      }
    }
  ],
  "total": 208,
  "skip": 0,
  "limit": 3
}

Deep-TOON Format:

users[1,]{id,firstName,lastName,age,address{address,city,state,coordinates{lat,lng}},bank{cardNumber,cardType}}:
  1,Emily,Johnson,28,("626 Main Street",Phoenix,Mississippi,(-77.16213,-92.084824)),("9289760655481815",Elo)
total: 208
skip: 0  
limit: 3

🔧 Usage Examples

Installation

pip install deep-toon

Basic Usage

import deep_toon

# Your JSON data
data = {
    "users": [
        {
            "id": 1,
            "name": "Alice",
            "address": {
                "street": "123 Main St",
                "city": "NYC",
                "coordinates": {"lat": 40.7, "lng": -74.0}
            }
        },
        {
            "id": 2,
            "name": "Bob", 
            "address": {
                "street": "456 Oak Ave",
                "city": "LA", 
                "coordinates": {"lat": 34.0, "lng": -118.2}
            }
        }
    ]
}

# Compress to Deep-TOON format
compressed = deep_toon.encode(data)
print("Compressed:", compressed)

# Decompress back to original
original = deep_toon.decode(compressed)
print("Original data restored:", data == original)

Output:

users[2,]{id,name,address{street,city,coordinates{lat,lng}}}:
  1,Alice,("123 Main St",NYC,(40.7,-74.0))
  2,Bob,("456 Oak Ave",LA,(34.0,-118.2))

Advanced Usage

# Use the classes directly for more control
from deep_toon import DeepToonEncoder, DeepToonDecoder

encoder = DeepToonEncoder()
decoder = DeepToonDecoder()

# Custom delimiter for data with commas
encoder = DeepToonEncoder(delimiter=';')
compressed = encoder.encode(data)

Smart Encoding (Save-Safe)

Use smart_encode to automatically fall back to minified JSON if Deep-TOON doesn't achieve a specified savings threshold (default 10%).

from deep_toon import smart_encode

# Only use Deep-TOON if it saves > 10% tokens
# Otherwise returns minified JSON
encoded = smart_encode(data, threshold=0.1)

# You can also use a custom token counter (defaults to char length)
encoded = smart_encode(data, token_counter=len)

🎨 Format Features

Schema Declaration

The schema explicitly declares the structure:

{field1,field2,nested{subfield1,subfield2},deep{level1{level2}}}

Tuple Nesting

Related fields are grouped into tuples:

# Person with address
person{name,age,address{street,city}}
# Results in: ("Alice", 30, ("123 Main", "NYC"))

Null Handling

Missing or null values are handled gracefully:

# With missing city
("123 Main", null, (40.7, -74.0))

Quoting Rules

Strings are quoted only when necessary:

# No quotes needed
Simple,Text,123

# Quotes for special characters  
"Text with, comma","Multi word text","123-abc"

🎨 Deep-TOON Design Philosophy

Deep-TOON uses hierarchical tuples to represent nested structures efficiently:

// Original JSON
{"user": {"profile": {"name": "Alice", "age": 30}}}

// Deep-TOON representation
[1,]{user{profile{name,age}}}:
  (("Alice",30))

Key Benefits:

  1. Compact schemas - Structure declared once, no repetition
  2. Explicit hierarchy - Clear nesting with {...} notation
  3. Tuple efficiency - Related data grouped logically
  4. LLM optimized - Easy to read and parse

🚀 Performance Characteristics

When Deep-TOON Excels

  • Nested objects (addresses, preferences, metadata)
  • Repeated structures (arrays of complex objects)
  • Deep hierarchies (API responses, config files)
  • Mixed data types (numbers, strings, booleans together)

Token Savings by Data Type

Data Type Typical Reduction
Flat objects 10-30%
1-level nesting 25-45%
2+ level nesting 30-60%
Array of objects 35-50%

🔧 Advanced Usage

Custom Delimiters

# Use semicolon delimiter for data containing commas
encoder = DeepToonEncoder(delimiter=";")

Handling Large Arrays

# Deep-TOON automatically detects when arrays are worth compressing
# Arrays with <2 items or inconsistent schemas fall back to JSON

Error Handling

try:
    decoded = decoder.decode(deep_toon_string)
except DeepToonDecodeError as e:
    print(f"Decode error: {e}")
    # Handle malformed Deep-TOON data

📈 Use Cases

  • LLM Training Data - Reduce token costs for large datasets
  • API Response Compression - Faster transmission and processing
  • Configuration Files - More readable than JSON for complex configs
  • Data Interchange - Efficient format for AI-to-AI communication
  • Prompt Engineering - Include more context in limited token budgets

🔬 Technical Details

Schema Detection Algorithm

  1. Field Analysis - Identify primitive vs nested fields
  2. Structure Grouping - Group related fields into tuples
  3. Optimization - Choose best compression strategy per field group
  4. Schema Generation - Create hierarchical schema notation

Parsing Strategy

  1. Pattern Matching - Detect Deep-TOON tabular format
  2. Schema Parsing - Build nested structure from schema
  3. Smart Splitting - Handle quoted strings and nested tuples
  4. Type Inference - Convert strings back to appropriate types

🤝 Contributing

Deep-TOON is designed to be extended and improved. Key areas for contribution:

  • Performance optimization for very large datasets
  • Additional encoding strategies for specific data patterns
  • Language bindings for other programming languages
  • Integration tools for popular APIs and frameworks

📄 License

Apache 2.0 License - Free for commercial and personal use!


Deep-TOON - Efficient JSON representation for LLM applications. 🚀✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_toon-0.2.0.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deep_toon-0.2.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file deep_toon-0.2.0.tar.gz.

File metadata

  • Download URL: deep_toon-0.2.0.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for deep_toon-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6b52e939f97d0fd394becf81c2507ab262cdd4a2508e301d2d0083a483def872
MD5 7295b6a93672316b5e6b2d5f1c2427d9
BLAKE2b-256 489b8bdf48063153c6c52f5a40500042c69545ab88285d6b999a0af1f0025a66

See more details on using hashes here.

File details

Details for the file deep_toon-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: deep_toon-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for deep_toon-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5dc993f75f1bdbb114d5019829dbfd3d524c51a2b093cda45c4e794b0ff8fca
MD5 0e5ac32937d5db89724bc429b9795182
BLAKE2b-256 1a95c4d92d7b57352646d4578e5521211913e52c4293cd4854cc7f5d30110824

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page