Zero Overhead Notation - Human-readable data serialization optimized for LLM token efficiency
Project description
ZON v1.0 (Entropy Engine)
Zero Overhead Notation - A human-readable data serialization format optimized for LLM token efficiency.
๐ 24-40% better compression than TOON | ๐ 30-42% compression vs JSON | ๐ 100% Human Readable
๐ Table of Contents
- What is ZON?
- Quick Start
- Installation
- Beginner Tutorial
- Advanced Usage
- LLM Framework Integration
- Benchmark Results
- API Reference
- Best Practices
- Limitations
๐ What is ZON?
ZON is a smart compression format designed specifically for transmitting structured data to Large Language Models. Unlike traditional compression (which creates binary data), ZON remains 100% human-readable while dramatically reducing token usage.
Why ZON?
| Problem | Solution |
|---|---|
| ๐ธ High LLM costs from verbose JSON | ZON reduces tokens by 30-42% |
| ๐ Binary formats aren't debuggable | ZON is plain text - you can read it! |
| ๐ฏ One-size-fits-all compression | ZON auto-selects optimal strategy per column |
| โ ๏ธ Data corruption risks | ZON has safety checkpoints every 50 rows |
Key Features
- โ Entropy Tournament: Auto-selects best compression strategy per column
- โ 8 Compression Strategies: ENUM, VALUE, DELTA, GAS_INT, GAS_PAT, GAS_MULT, LIQUID, SOLID
- โ Human Readable: Unlike TOON's binary format
- โ 100% Safe: Guaranteed lossless reconstruction
- โ Zero Configuration: Works out of the box
โก Quick Start
import zon
# Your data
users = [
{"id": 1, "name": "Alice", "role": "Admin", "active": True},
{"id": 2, "name": "Bob", "role": "User", "active": True},
{"id": 3, "name": "Charlie", "role": "User", "active": False}
]
# Encode (compress)
compressed = zon.encode(users)
print(compressed)
# Output:
# #Z:1.0|D=User|rows[3]{active:E(T,F),id:R(1,1),name:S,role:E(Admin,%0)}|A=50
# $1:0,1,Alice,0
# 1,2,Bob,1
# 0,3,Charlie,1
# Decode (decompress)
original = zon.decode(compressed)
assert original == users # โ Perfect reconstruction!
Compression achieved: ~60% smaller than JSON! ๐
๐ฆ Installation
From PyPI (Recommended)
pip install zon-format
From Source
git clone https://github.com/yourusername/zon-format.git
cd zon-format
pip install -e .
Verify Installation
import zon
print("ZON installed successfully! โ
")
๐ Beginner Tutorial
Step 1: Understanding Your Data
ZON works best with lists of similar objects (like database rows or API responses):
# โ
Good - structured data
users = [
{"name": "Alice", "age": 30},
{"name": "Bob", "age": 25}
]
# โ ๏ธ Less effective - single object
single_user = {"name": "Alice", "age": 30}
# โ
Still works - will use "inline mode"
small_list = [{"id": 1}]
Step 2: Basic Encoding
import zon
data = [
{"product": "Laptop", "price": 999.99, "stock": 15},
{"product": "Mouse", "price": 29.99, "stock": 50},
{"product": "Keyboard", "price": 79.99, "stock": 30}
]
# Compress
zon_string = zon.encode(data)
# Save to file
with open('products.zon', 'w') as f:
f.write(zon_string)
print(f"Original JSON: {len(str(data))} chars")
print(f"ZON format: {len(zon_string)} chars")
print(f"Reduction: {(1 - len(zon_string)/len(str(data))) * 100:.1f}%")
Step 3: Basic Decoding
# Read from file
with open('products.zon', 'r') as f:
zon_string = f.read()
# Decompress
data = zon.decode(zon_string)
# Use your data
for product in data:
print(f"{product['product']}: ${product['price']}")
Step 4: Understanding the Format
Let's break down a ZON string:
#Z:1.0|rows[3]{id:R(1,1),status:E(active,inactive)}|A=50
$1:1,0
2,1
3,0
#Z:1.0: Version headerrows[3]: 3 rows of dataid:R(1,1): ID column uses Range strategy (start=1, step=1)status:E(active,inactive): Status uses ENUM (0=active, 1=inactive)A=50: Anchors every 50 rows$1:: First row (explicit anchor)1,0: Row 2 data (id=2, status=active)
๐ Advanced Usage
Custom Anchor Intervals
Control safety vs compression trade-off:
# More safety (anchor every 25 rows)
encoded = zon.encode(data, anchor_every=25)
# More compression (anchor every 100 rows)
encoded = zon.encode(data, anchor_every=100)
# Default is 50 (balanced)
Handling Different Data Types
data = [
{
"id": 1,
"name": "Product",
"price": 99.99, # Float
"in_stock": True, # Boolean
"description": None, # Null
"tags": ["sale", "new"], # Array (stringified)
"metadata": {"key": "val"} # Nested object (flattened)
}
]
encoded = zon.encode(data)
decoded = zon.decode(encoded)
Working with Large Datasets
import json
import zon
# Read large JSON file
with open('large_dataset.json', 'r') as f:
data = json.load(f)
# Encode in chunks if needed
chunk_size = 1000
for i in range(0, len(data), chunk_size):
chunk = data[i:i+chunk_size]
zon_chunk = zon.encode(chunk)
with open(f'chunk_{i//chunk_size}.zon', 'w') as f:
f.write(zon_chunk)
Streaming Data
# Process data as it arrives
def process_api_response(response_json):
# Encode for LLM
zon_data = zon.encode(response_json)
# Send to LLM (uses fewer tokens!)
llm_response = send_to_llm(zon_data)
return llm_response
๐ค LLM Framework Integration
OpenAI Integration
import zon
import openai
# Prepare your data
users = [{"id": i, "name": f"User{i}", "active": True} for i in range(100)]
# Compress with ZON (saves tokens = saves money!)
zon_data = zon.encode(users)
# Use in prompt
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You will receive data in ZON format. Decode mentally and analyze."},
{"role": "user", "content": f"Analyze this user data:\n\n{zon_data}\n\nHow many active users?"}
]
)
print(response.choices[0].message.content)
Cost Savings: ~30-40% fewer tokens vs JSON!
LangChain Integration
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
import zon
# Prepare data
products = [
{"name": "Laptop", "price": 999, "rating": 4.5},
{"name": "Mouse", "price": 29, "rating": 4.2},
# ... 100 more products
]
# Compress
zon_products = zon.encode(products)
# Create prompt template
template = """
You have access to product data in ZON format (a compressed JSON format).
Product Data:
{zon_data}
Question: {question}
Please analyze the data and answer.
"""
prompt = PromptTemplate(
input_variables=["zon_data", "question"],
template=template
)
# Use with LangChain
llm = OpenAI(temperature=0)
chain = prompt | llm
result = chain.invoke({
"zon_data": zon_products,
"question": "What's the average price of products with rating > 4?"
})
print(result)
LlamaIndex Integration
from llama_index import GPTSimpleVectorIndex, Document
import zon
# Prepare documents with ZON compression
docs_data = [
{"title": "Doc1", "content": "...", "metadata": {...}},
{"title": "Doc2", "content": "...", "metadata": {...}},
# ... many more
]
# Compress metadata with ZON
zon_metadata = zon.encode([d["metadata"] for d in docs_data])
# Create documents
documents = [
Document(
text=doc["content"],
extra_info={"compressed_meta": zon_metadata}
)
for doc in docs_data
]
# Build index
index = GPTSimpleVectorIndex.from_documents(documents)
# Query (the compressed metadata uses fewer tokens!)
response = index.query("Find documents about topic X")
Anthropic Claude Integration
import anthropic
import zon
client = anthropic.Anthropic(api_key="your-key")
# Large dataset
analytics_data = [...] # 1000 rows
# Compress with ZON
zon_data = zon.encode(analytics_data)
message = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
messages=[
{
"role": "user",
"content": f"""
I'm providing analytics data in ZON format (compressed JSON).
Data:
{zon_data}
Please analyze trends and provide insights.
"""
}
]
)
print(message.content)
Hugging Face Transformers
from transformers import pipeline
import zon
# Sentiment analysis on compressed data
classifier = pipeline("sentiment-analysis")
reviews = [
{"text": "Great product!", "rating": 5},
{"text": "Not bad", "rating": 3},
# ... 100 more
]
# Compress for context
zon_reviews = zon.encode(reviews)
# Include in prompt
prompt = f"""
Review Data (ZON format):
{zon_reviews}
Analyze overall sentiment.
"""
result = classifier(prompt)
print(result)
๐ Benchmark Results
Standard Datasets
| Dataset | Records | JSON Size | ZON Size | Compression | vs TOON |
|---|---|---|---|---|---|
| employees.json | 20 | 15,381 | 5,668 | 63.1% | +9.7% ๐ |
| orders.json | 50 | 22,704 | 15,816 | 30.3% | +2.7% โ |
| complex_nested.json | 1000 | 429,492 | 103,166 | 76.0% | +76.6% ๐ |
Real-World API Data
| Dataset | Records | Compression | vs TOON |
|---|---|---|---|
| Random Users API | 50 | 42.4% | +40.4% ๐ |
| StackOverflow Q&A | 50 | 42.4% | +40.4% ๐ |
| GitHub Repos | 8 | 33.9% | +32.8% โ |
Average Performance: 30.5% compression, +24.1% better than TOON
๐ API Reference
zon.encode(data, anchor_every=50)
Encode a list of dictionaries into ZON format.
Parameters:
data(List[Dict]): List of dictionaries to encodeanchor_every(int, optional): Rows between safety anchors. Default: 50
Returns:
str: ZON-formatted string
Example:
zon_str = zon.encode([{"id": 1, "name": "Alice"}])
Raises:
TypeError: If data is not a list
zon.decode(zon_str)
Decode a ZON-formatted string back to original data.
Parameters:
zon_str(str): ZON-formatted string
Returns:
List[Dict]: Original data structure
Example:
data = zon.decode("#Z:1.0|rows[1]{id:R(1,1)}|A=50\n$1:1")
Raises:
ZonDecodeError: If string is malformed
๐ก Best Practices
โ DO:
-
Use for structured data
# Perfect use case db_results = [{"id": 1, "name": "..."}, ...] zon.encode(db_results)
-
Batch similar data
# Good - all objects have same structure users = [{"name": "Alice", "age": 30}, ...]
-
Use appropriate anchor intervals
# For critical data: more anchors zon.encode(data, anchor_every=25) # For non-critical: fewer anchors (more compression) zon.encode(data, anchor_every=100)
-
Profile your data
import json json_size = len(json.dumps(data)) zon_size = len(zon.encode(data)) print(f"Reduction: {(1 - zon_size/json_size) * 100:.1f}%")
โ DON'T:
-
Don't use for binary data
# Bad - use appropriate binary format image_bytes = b'\x89PNG...'
-
Don't use for highly random data
# Bad - no patterns to compress random_data = [{"val": random.random()} for _ in range(100)]
-
Don't modify ZON strings manually
# Bad - will break decoding zon_str = zon_str.replace("1", "2")
โ ๏ธ Limitations
Known Limitations
-
Arrays in Objects: Arrays within objects are stringified
# Input: [{"tags": ["a", "b"]}] # Output: [{"tags": "['a', 'b']"}] # String, not array
-
Keys with Dots: Conflicts with flattening
# Avoid: {"user.name": "Alice"} # Use instead: {"user_name": "Alice"}
-
Empty Lists: Returns
[{}]instead of[]# Workaround: Check before encoding if not data: return "[]"
See EDGE_CASES.md for full details.
๐ค Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Submit a pull request
๐ License
Proprietary License - Free for Production Use
โ You CAN:
- Use ZON in production (commercial or non-commercial)
- Integrate into your applications and services
- Deploy at any scale
โ You CANNOT:
- Redistribute or sell the source code
- Modify and redistribute
- Create competing products
Copyright (c) 2025 Roni Bhakta. All Rights Reserved.
See LICENSE for full terms. For custom licensing: ronibhakta1@gmail.com
๐ Acknowledgments
- Inspired by TOON format for LLM token efficiency
- Benchmark datasets from JSONPlaceholder, GitHub API, Random User Generator, StackExchange API
- Community feedback and testing
๐ Support
- Documentation: Full Docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with โค๏ธ for the LLM community
ZON v1.0 - Compression that scales with complexity
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zon_format-1.0.0.tar.gz.
File metadata
- Download URL: zon_format-1.0.0.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8622a07901d0255140bb102eb3fb6b45e716f12c93e99c119ed96458a2001525
|
|
| MD5 |
7dae59617fa0f27fe9cb0886a3874045
|
|
| BLAKE2b-256 |
8eb5e261a5f5659e423cdfe9b588f662cf1dc105cb9cab954c38e4bb0c13fe5e
|
File details
Details for the file zon_format-1.0.0-py3-none-any.whl.
File metadata
- Download URL: zon_format-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40aa1981877b66d31ae1b484542c921a62d50cc4101e2f0eccaf70b9296ebb61
|
|
| MD5 |
2bf1e609e4cc9e748f908508fdb5d2cc
|
|
| BLAKE2b-256 |
d8c60e02c2f03f9b91f66a563e53709df9004634d0519b5cafdd9cff9eb6c63b
|