Skip to main content

Zero Overhead Notation v8.0 (ClearText) - Human-readable data format with 30%+ compression over JSON

Project description

ZON Format v8.0 (ClearText)

Zero-Overhead Notation - A human-readable, LLM-optimized data format that achieves 30%+ compression over JSON while remaining visually clean and intuitive.

PyPI version License: Apache 2.0

Why ZON?

ZON v8.0 "ClearText" combines the readability of YAML with the compression efficiency better than TOON, producing output that looks like structured documents rather than escaped protocols.

Performance

  • 31.9% smaller than JSON on average
  • 25.6% better than TOON across benchmarks
  • Zero protocol overhead - no pipes, markers, or complex headers
  • LLM-friendly - readable without knowing the format

Quick Example

Input (JSON):

{
  "context": "Hiking Trip",
  "friends": ["ana", "luis", "sam"],
  "hikes": [
    {"id": 1, "name": "Blue Lake Trail", "sunny": true},
    {"id": 2, "name": "Ridge Overlook", "sunny": false}
  ]
}

Output (ZON v8.0):

context:Hiking Trip
friends:[ana,luis,sam]

@hikes(2):id,name,sunny
1,Blue Lake Trail,T
_,Ridge Overlook,F

Size: JSON: 201 bytes → ZON: 106 bytes (47% smaller)

Installation

pip install zon-format

Usage

import zon

# Encode
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
encoded = zon.encode(data)
print(encoded)
# Output:
# @users(2):id,name
# 1,Alice
# _,Bob

# Decode
decoded = zon.decode(encoded)
assert decoded == data  # Perfect roundtrip

Format Reference

Metadata (YAML-like)

key:value
nested.key:value
list:[item1,item2,item3]
  • No spaces after : for compactness
  • Dot notation for nested objects
  • Minimal quoting (only when necessary)

Tables (@table syntax)

@tablename(count):col1,col2,col3
val1,val2,val3
val1,val2,val3
  • @ marks table start
  • (count) shows row count
  • Columns separated by commas (no spaces)

Compression Tokens

Token Meaning Example
T Boolean true T instead of true
F Boolean false F instead of false

Note: ZON v1.0.1 prioritizes explicit data. Compression tokens like ^ (repeat) and _ (auto-increment) are disabled to ensure every row contains its full, actual data.

Smart Quoting

Quotes are only added when necessary:

Value Encoded Reason
ana ana No special chars
Blue Lake Blue Lake Spaces OK
a,b "a,b" Contains comma (delimiter)
Hello: World Hello: World Colons OK

Format Comparison

Random Users API (10 records)

JSON (15,026 bytes):

[
  {
    "gender": "female",
    "name": {"title": "Ms", "first": "Sophia", "last": "Wilson"},
    "location": {"city": "Austin", "state": "Texas"},
    ...
  }
]

TOON (10,626 bytes):

results[50]{gender,name{title,first,last},location{city,state},...}
female,Ms,Sophia,Wilson,Austin,Texas,...

ZON v8.0 (6,767 bytes - 55% smaller than JSON):

@data(10):gender,location.city,location.state,name.first,name.last,name.title
female,Austin,Texas,Sophia,Wilson,Ms
^,^,^,Emma,Johnson,Mrs
male,Portland,Oregon,Liam,Brown,Mr
...

Benchmarks

Run the comprehensive benchmark suite:

python benchmarks/generate_datasets.py  # Generate test data
python test_comprehensive.py            # Run benchmarks

Results (318 records across 6 datasets)

Dataset Records vs JSON vs TOON
Random Users API 50 -42.4% +40.4%
StackOverflow Q&A 50 -43.1% +41.1%
JSONPlaceholder Posts 100 -13.4% -0.1%
JSONPlaceholder Comments 100 -15.4% +0.0%
JSONPlaceholder Users 10 -40.3% +36.3%
GitHub Repos 8 -37.1% +36.0%
AVERAGE -31.9% +25.6%

View Encoded Samples

Compare formats side-by-side:

python benchmarks/generate_samples.py
# Generates .json, .zon, and .toon files in benchmarks/encoded_samples/

Open any .zon file to see the clean, readable output!

How It Works

1. Root Promotion

ZON automatically separates metadata (context) from data (tables):

{"context": "Trip", "hikes": [{...}, {...}]}

context:Trip

@hikes(2):...

3. Intelligent Compression

  • Sequential IDs: 1,_,_ (auto-increment)
  • Repetitive values: Uses ^ token
  • Booleans: T/F (1 byte vs 4-5 bytes)
  • No quotes: Unless value contains , or control chars

Using with LLMs

ZON is token-efficient and integrates with modern LLM tooling. This repository keeps concise examples for the most common integrations.

LangChain

Compress structured payloads with zon.encode() before sending them through LangChain prompts. See BENCHMARKS_ALL.md for sample usage and token impact.

LangGraph

Attach ZON-encoded payloads as node metadata to reduce token footprint when traversing or querying graphs.

dspy

Use zon.decode() to convert ZON strings back to Python objects and stream into dataframes or telemetry pipelines for analysis.

CLI Tool

# Encode
zon encode input.json output.zon

# Decode
zon decode input.zon output.json

# Benchmark
zon benchmark data.json

Development

# Install in development mode
pip install -e .

# Run tests
python -m pytest tests/

# Run benchmarks
python test_comprehensive.py

Version History

v1.0.1 (2025-11-24) - "ClearText"

  • ✅ Removed protocol overhead (no more #Z:, pipes, or markers)
  • ✅ YAML-like metadata syntax (key:value)
  • ✅ Clean @table syntax
  • ✅ Aggressive quote removal (spaces no longer trigger quoting)
  • ✅ Compact array syntax: [item1,item2,item3]
  • ✅ Optimized nested data: {key:val} syntax (no more JSON strings)
  • ✅ 31.9% compression vs JSON, 25.6% better than TOON

v1.0.0 (2025-11-23)

  • Initial release with pipe-based protocol syntax

License

Apache License 2.0 - see LICENSE file

Contributing

Contributions welcome! Please open an issue or PR on GitHub.


Made with ❤️ for efficient data transmission and LLM optimization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zon_format-1.0.1.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zon_format-1.0.1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file zon_format-1.0.1.tar.gz.

File metadata

  • Download URL: zon_format-1.0.1.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for zon_format-1.0.1.tar.gz
Algorithm Hash digest
SHA256 74497fb42226e8bfd1f6e5b97e2aae0bdc50b069e22a58bdeb199d7f548e0776
MD5 a9142eee1d8483237cac2d3defc8cc33
BLAKE2b-256 01d645e24cb6939c69d2bfab4afea9cd7d6755f6f6d455d7d2078c040c1841d6

See more details on using hashes here.

File details

Details for the file zon_format-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: zon_format-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for zon_format-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 efe007fe89e35bd9878d4512277e3294d336d69d2b961877c527561abde1add0
MD5 02ab45f487859fba213e7bfeac5b4324
BLAKE2b-256 df8ded626e9c70714e57518dafac6694f47503991999fab10f643fd37c7503cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page