Zero Overhead Notation v1.0.3 - Human-readable data format with 30%+ compression over JSON
Project description
Zero Overhead Notation (ZON) Format
Zero Overhead Notation - A compact, human-readable way to encode JSON for LLMs.
File Extension: .zonf | Media Type: text/zon | Encoding: UTF-8
ZON is a token-efficient serialization format designed for LLM workflows. It achieves 35-50% token reduction vs JSON through tabular encoding, single-character primitives, and intelligent compression while maintaining 100% data fidelity.
[!TIP] The ZON format is stable, but it's also an evolving concept. There's no finalization yet, so your input is valuable. Contribute to the spec or share your feedback to help shape its future.
Table of Contents
- Why ZON?
- Key Features
- Installation & Quick Start
- Format Overview
- API Reference
- Security & Data Types
- Benchmarks
Why ZON?
AI is becoming cheaper and more accessible, but larger context windows allow for larger data inputs as well. LLM tokens still cost money โ and standard JSON is verbose and token-expensive:
{
"context": {
"task": "Our favorite hikes together",
"location": "Boulder",
"season": "spring_2025"
},
"friends": ["ana", "luis", "sam"],
"hikes": [
{
"id": 1,
"name": "Blue Lake Trail",
"distanceKm": 7.5,
"elevationGain": 320,
"companion": "ana",
"wasSunny": true
},
...
]
}
ZON conveys the same information with fewer tokens โ using compact table format with explicit headers:
context:"{task:Our favorite hikes together,location:Boulder,season:spring_2025}"
friends:"[ana,luis,sam]"
hikes:@(3):companion,distanceKm,elevationGain,id,name,wasSunny
ana,7.5,320,1,Blue Lake Trail,T
luis,9.2,540,2,Ridge Overlook,F
sam,5.1,180,3,Wildflower Loop,T
Key Features
- ๐ฏ 100% LLM Accuracy: Achieves perfect retrieval with self-explanatory structure
- ๐พ Most Token-Efficient: 15-35% fewer tokens than JSON across all tokenizers
- ๐ฏ JSON Data Model: Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips
- ๐ Minimal Syntax: Explicit headers (
@(N)for count, column list) eliminate ambiguity for LLMs - ๐งบ Tabular Arrays: Uniform arrays collapse into tables that declare fields once and stream row values
- ๐ข Canonical Numbers: No scientific notation (1000000, not 1e6), NaN/Infinity โ null
- ๐ณ Deep Nesting: Handles complex nested structures efficiently
- ๐ Security Limits: Automatic DOS prevention (100MB docs, 1M arrays, 100K keys)
- โ Production Ready: 93/93 tests pass, all datasets verified, zero data loss
Installation & Quick Start
From PyPI (Recommended)
pip install zon-format
Basic Usage
import zon
# Your data
data = {
"users": [
{"id": 1, "name": "Alice", "role": "admin", "active": True},
{"id": 2, "name": "Bob", "role": "user", "active": True}
]
}
# Encode to ZON
encoded = zon.encode(data)
print(encoded)
# users:@(2):active,id,name,role
# T,1,Alice,admin
# T,2,Bob,user
# Decode back
decoded = zon.decode(encoded)
assert decoded == data # โ Lossless!
Decode Options
import zon
# Strict mode (default) - validates table structure
data = zon.decode(zon_string)
# Non-strict mode - allows row/field count mismatches
data = zon.decode(zon_string, strict=False)
Error Handling
from zon import decode, ZonDecodeError
try:
data = decode(invalid_zon)
except ZonDecodeError as e:
print(e.code) # "E001" (row count) or "E002" (field count)
print(e.message) # Detailed error message
print(e.context) # Context information
Format Overview
ZON auto-selects the optimal representation for your data.
Tabular Arrays
Best for arrays of objects with consistent structure:
users:@(3):active,id,name,role
T,1,Alice,Admin
T,2,Bob,User
F,3,Carol,Guest
@(3)= row count- Column names listed once
- Data rows follow
Nested Objects
Best for configuration and nested structures:
config:"{database:{host:db.example.com,port:5432},features:{darkMode:T}}"
Compression Tokens
| Token | Meaning | JSON Equivalent |
|---|---|---|
T |
Boolean true | true |
F |
Boolean false | false |
null |
Null value | null |
API Reference
zon.encode(data)
Encodes a Python object to ZON format.
Parameters:
data(Any): The input data to encode. Must be JSON-serializable.
Returns:
str: The ZON-encoded string.
Raises:
ZonEncodeError: If circular reference detected.
Example:
import zon
data = {"id": 1, "name": "Alice"}
zon_str = zon.encode(data)
zon.decode(zon_str, strict=True)
Decodes a ZON-formatted string back to Python object.
Parameters:
zon_str(str): The ZON-encoded string to decode.strict(bool): If True (default), validates table structure.
Returns:
Any: The decoded Python object (dict or list).
Raises:
ZonDecodeError: On validation errors or security limit violations.
Error Codes:
E001: Row count mismatch (table has fewer/more rows than declared)E002: Field count mismatch (row has fewer fields than columns)E301: Document size exceeds 100MBE302: Line length exceeds 1MBE303: Array length exceeds 1M itemsE304: Object key count exceeds 100K
Security & Data Types
Eval-Safe Design
ZON is immune to code injection attacks:
โ
No eval() - Pure data format, zero code execution
โ
No object constructors - Unlike YAML's exploit potential
โ
No prototype pollution - Dangerous keys blocked (__proto__, constructor)
โ
Type-safe parsing - Numbers parsed safely, not via eval()
Data Type Preservation
- โ
Integers:
42stays integer - โ
Floats:
3.14preserves decimal - โ
Booleans: Explicit
T/F(not string"true"/"false") - โ
Null: Explicit
null(not omitted) - โ
No scientific notation:
1000000, not1e6 - โ
Special values normalized:
NaN/Infinityโnull
Security Limits (DOS Prevention)
| Limit | Maximum | Error Code |
|---|---|---|
| Document size | 100 MB | E301 |
| Line length | 1 MB | E302 |
| Array length | 1M items | E303 |
| Object keys | 100K keys | E304 |
| Nesting depth | 100 levels | - |
Protection is automatic - no configuration required.
Benchmarks
Retrieval Accuracy
Benchmarks test LLM comprehension using 24 data retrieval questions on gpt-5-nano (Azure OpenAI).
| Format | Accuracy | Tokens | Efficiency Score |
|---|---|---|---|
| ZON | 100.0% | 19,995 | 123.2 acc%/10K ๐ |
| TOON | 100.0% | 20,988 | 118.0 acc%/10K |
| CSV | 100.0% | ~20,500 | ~117 acc%/10K |
| JSON compact | 91.7% | 27,300 | 82.1 acc%/10K |
| JSON | 91.7% | 28,042 | 78.5 acc%/10K |
ZON achieves 100% accuracy (vs JSON's 91.7%) while using 29% fewer tokens.
Token Efficiency Benchmark
Tokenizers: GPT-4o (o200k), Claude 3.5 (Anthropic), Llama 3 (Meta)
Unified Dataset
GPT-4o (o200k):
ZON โโโโโโโโโโโโโโโโโโโโ 522 tokens ๐
CSV โโโโโโโโโโโโโโโโโโโโ 534 tokens (+2.3%)
JSON (cmp) โโโโโโโโโโโโโโโโโโโโ 589 tokens (+11.4%)
TOON โโโโโโโโโโโโโโโโโโโโ 614 tokens (+17.6%)
YAML โโโโโโโโโโโโโโโโโโโโ 728 tokens (+39.5%)
JSON format โโโโโโโโโโโโโโโโโโโโ 939 tokens (+44.4%)
XML โโโโโโโโโโโโโโโโโโโโ 1,093 tokens (+109.4%)
Claude 3.5 (Anthropic):
CSV โโโโโโโโโโโโโโโโโโโโ 544 tokens ๐
ZON โโโโโโโโโโโโโโโโโโโโ 545 tokens (+0.2%)
TOON โโโโโโโโโโโโโโโโโโโโ 570 tokens (+4.6%)
JSON (cmp) โโโโโโโโโโโโโโโโโโโโ 596 tokens (+8.6%)
YAML โโโโโโโโโโโโโโโโโโโโ 641 tokens (+17.6%)
Llama 3 (Meta):
ZON โโโโโโโโโโโโโโโโโโโโ 701 tokens ๐
CSV โโโโโโโโโโโโโโโโโโโโ 728 tokens (+3.9%)
JSON (cmp) โโโโโโโโโโโโโโโโโโโโ 760 tokens (+7.8%)
TOON โโโโโโโโโโโโโโโโโโโโ 784 tokens (+11.8%)
YAML โโโโโโโโโโโโโโโโโโโโ 894 tokens (+27.5%)
Large Complex Nested Dataset
GPT-4o (o200k):
ZON โโโโโโโโโโโโโโโโโโโโ 147,267 tokens ๐
CSV โโโโโโโโโโโโโโโโโโโโ 165,647 tokens (+12.5%)
JSON (cmp) โโโโโโโโโโโโโโโโโโโโ 189,193 tokens (+28.4%)
TOON โโโโโโโโโโโโโโโโโโโโ 225,510 tokens (+53.1%)
Overall Summary
| Tokenizer | ZON vs TOON | ZON vs JSON |
|---|---|---|
| GPT-4o | -34.7% fewer tokens | -22.2% fewer tokens |
| Claude 3.5 | -24.4% fewer tokens | -19.6% fewer tokens |
| Llama 3 | -25.7% fewer tokens | -15.3% fewer tokens |
Key Insight: ZON is the only format that wins or nearly wins across all models & datasets.
LLM Framework Integration
OpenAI
import zon
import openai
users = [{"id": i, "name": f"User{i}", "active": True} for i in range(100)]
# Compress with ZON (saves tokens = saves money!)
zon_data = zon.encode(users)
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You will receive data in ZON format."},
{"role": "user", "content": f"Analyze this user data:\n\n{zon_data}"}
]
)
LangChain
from langchain.llms import OpenAI
import zon
products = [{"name": "Laptop", "price": 999, "rating": 4.5}, ...]
zon_products = zon.encode(products)
# Use in your LangChain prompts with fewer tokens!
Documentation
Comprehensive guides and references are available in the docs/ directory:
๐ Syntax Cheatsheet
Quick reference for ZON format syntax with practical examples.
- Basic types and primitives (strings, numbers, booleans, null)
- Objects and nested structures
- Arrays (tabular, inline, mixed)
- Quoting rules and escape sequences
- Complete examples with JSON comparisons
๐ง API Reference
Complete API documentation for zon-format v1.0.3.
encode()function - detailed parameters and examplesdecode()function - strict mode options and error handling- Python type definitions
- Error codes and security limits
๐ Complete Specification
Comprehensive formal specification including:
- Data model and encoding rules
- Security model (DOS prevention, no eval)
- Data type system and preservation guarantees
- Conformance checklists
- Media type specification (
.zonf,text/zon)
๐ค LLM Best Practices
Guide for maximizing ZON's effectiveness in LLM applications.
- Prompting strategies for LLMs
- Common use cases (data retrieval, aggregation, filtering)
- Optimization tips for token usage
- Model-specific tips (GPT-4, Claude, Llama)
- Complete real-world examples
Quality & Testing
Test Coverage
- Unit tests: 93/93 passed (security, conformance, validation)
- Roundtrip tests: 13/13 datasets verified
- No data loss or corruption
Validation (Strict Mode)
Enabled by default - validates table structure:
# Strict mode (default)
data = zon.decode(zon_string)
# Non-strict mode
data = zon.decode(zon_string, strict=False)
Links
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Submit a pull request
License
MIT License
Copyright (c) 2025 ZON-FORMAT (Roni Bhakta)
See LICENSE for details.
Made with โค๏ธ for the LLM community
ZON v1.0.3 - Token efficiency that scales with complexity
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zon_format-1.0.3.tar.gz.
File metadata
- Download URL: zon_format-1.0.3.tar.gz
- Upload date:
- Size: 45.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5996d02998ffed0cf5047bb6a60b096a1a29d49918b848c7d8e8cd2d66790952
|
|
| MD5 |
9877f4200db9993208e38d987d88da85
|
|
| BLAKE2b-256 |
48c900dc2e715ee6ce62e9267dfdef0e0194cb81eb7f04777e4f43a30ea386f1
|
File details
Details for the file zon_format-1.0.3-py3-none-any.whl.
File metadata
- Download URL: zon_format-1.0.3-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6832be40814f81b38ad665b070954d65da46df00a5f51fd84ad5e25a4024015
|
|
| MD5 |
8d6fd58ed6d821393e1d96165cc893b2
|
|
| BLAKE2b-256 |
472d00386021bc0cbe81c2c602b95c06dd45f1b313f84f81b129f107d0e6f20e
|