Advanced data serialization achieving 50-60% token reduction with Compression Modes, SQL-like Query Language, and Streaming Support for LLMs
Project description
ATON V2.0 - Adaptive Token-Oriented Notation
Production-grade data serialization achieving 50-60% token reduction for Large Language Models with advanced compression, SQL-like query language, and streaming support.
What's New in V2.0
Advanced Compression System
- 4 Compression Modes: Fast, Balanced, Ultra, and Adaptive
- Intelligent Algorithms: Dictionary, Delta encoding, Pattern recognition
- AI-Driven Selection: Adaptive mode analyzes data and chooses optimal strategy
SQL-like Query Language
- Full AST Parser: Not simplified regex, real production parser
- Complete Operators:
=,!=,<,>,<=,>=,IN,LIKE,BETWEEN - Complex Logic:
AND,OR,NOTwith nested conditions - Advanced Features:
SELECT,ORDER BY,LIMIT,OFFSET
Streaming Support
- Memory Efficient: Process millions of records with constant memory
- Schema Caching: First chunk includes schema, subsequent chunks only data
- Progress Tracking: Monitor encoding progress
- Configurable Chunks: Adjust chunk size for your use case
Production-Grade Quality
- Custom Exceptions: Complete error hierarchy
- Full Validation: Input validation, type checking, boundary conditions
- Type Safety: 100% type hints coverage
- Zero Compromises: Professional code quality throughout
Performance
Token Reduction vs JSON
| Dataset | JSON Tokens | ATON Tokens | Reduction |
|---|---|---|---|
| Employee Records (1K) | 12,450 | 5,280 | 57.6% |
| Product Catalog (10K) | 145,200 | 64,800 | 55.4% |
| Transaction Log (100K) | 1,856,000 | 815,000 | 56.1% |
| Real-time Agents | 42,000 | 18,500 | 56.0% |
Compression Speed
| Mode | Records/sec | Latency | Best For |
|---|---|---|---|
| FAST | ~50,000 | <1ms | Real-time applications |
| BALANCED | ~35,000 | <2ms | General purpose (recommended) |
| ULTRA | ~20,000 | <5ms | Batch processing, storage |
| ADAPTIVE | ~30,000 | <3ms | Mixed workloads |
Real-World Cost Savings
Use Cases
| Application | Records/Hour | Annual API Savings |
|---|---|---|
| Multi-Agent Systems | 1,000 updates | $609,120 |
| E-commerce Platform | 100K products | $182,500 |
| Analytics Dashboard | Millions of events | $1,460,000 |
| Customer Support (RAG) | Thousands of chats | $584,930 |
| Financial Trading | Real-time ticks | $2,920,000 |
| Healthcare Records | Patient data | $730,000 |
Installation
Python
pip install aton-format
JavaScript
npm install aton-format
# or
yarn add aton-format
From Source
git clone https://github.com/dagoSte/aton-format.git
cd aton-format
pip install -e .
Quick Start
Basic Usage (Python)
from aton_format import ATONEncoder, ATONDecoder, CompressionMode
# Initialize encoder with compression
encoder = ATONEncoder(
compression=CompressionMode.BALANCED, # or FAST, ULTRA, ADAPTIVE
optimize=True
)
# Your data
data = {
"employees": [
{"id": 1, "name": "Alice", "salary": 95000, "active": True},
{"id": 2, "name": "Bob", "salary": 92000, "active": True},
{"id": 3, "name": "Carol", "salary": 110000, "active": False}
]
}
# Encode to ATON (50-60% fewer tokens!)
aton_text = encoder.encode(data)
print(aton_text)
# Output:
# @schema[id:int, name:str, salary:int, active:bool]
# @defaults[active=true]
#
# employees(3):
# 1, "Alice", 95000, null
# 2, "Bob", 92000, null
# 3, "Carol", 110000, false
# Decode back to original
decoder = ATONDecoder()
original = decoder.decode(aton_text)
assert data == original # Perfect round-trip!
Basic Usage (JavaScript)
const ATON = require('aton-format');
// Initialize encoder
const encoder = new ATON.Encoder({
compression: ATON.CompressionMode.BALANCED,
optimize: true
});
// Your data
const data = {
employees: [
{id: 1, name: "Alice", salary: 95000, active: true},
{id: 2, name: "Bob", salary: 92000, active: true},
{id: 3, name: "Carol", salary: 110000, active: false}
]
};
// Encode to ATON
const atonText = encoder.encode(data);
console.log(atonText);
// Decode back
const decoder = new ATON.Decoder();
const original = decoder.decode(atonText);
Advanced Features
1. Query Language
Filter and transform data before encoding:
from aton_format import ATONEncoder, CompressionMode
encoder = ATONEncoder(
compression=CompressionMode.ADAPTIVE,
queryable=True
)
# Complex query with SQL-like syntax
result = encoder.encode_with_query(
data,
"""
employees WHERE
(salary > 100000 AND role = 'Engineer')
OR department = 'Executive'
ORDER BY salary DESC
LIMIT 20
"""
)
# Only relevant records encoded!
# Massive token savings for large datasets
2. Streaming for Large Datasets
Process millions of records with constant memory:
from aton_format import ATONStreamEncoder, CompressionMode
# Initialize stream encoder
stream_encoder = ATONStreamEncoder(
chunk_size=1000,
compression=CompressionMode.ULTRA
)
# Large dataset
data = {"transactions": large_transaction_list}
# Stream encode in chunks
for chunk in stream_encoder.stream_encode(data):
print(f"Chunk {chunk['chunk_id']}/{chunk['total_chunks']}")
print(f"Records: {chunk['metadata']['records_in_chunk']}")
# Process chunk (send to LLM, store, etc.)
process_chunk(chunk['data'])
# Memory stays constant!
3. Compression Modes
Choose the right strategy for your use case:
from aton_format import ATONEncoder, CompressionMode
# Fast mode - optimized for speed
encoder = ATONEncoder(compression=CompressionMode.FAST)
# Balanced mode - optimal mix (RECOMMENDED)
encoder = ATONEncoder(compression=CompressionMode.BALANCED)
# Ultra mode - maximum compression
encoder = ATONEncoder(compression=CompressionMode.ULTRA)
# Adaptive mode - AI-driven selection
encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE)
4. Query Operators
Full SQL-like query support:
# Comparison operators
"products WHERE price > 100"
"employees WHERE age < 30"
"orders WHERE total >= 1000"
# Special operators
"users WHERE status IN ('active', 'pending')"
"products WHERE name LIKE '%Premium%'"
"orders WHERE total BETWEEN 100 AND 500"
# Complex logic
"employees WHERE (role = 'Engineer' OR role = 'Manager') AND salary > 80000"
# Field selection
"SELECT name, email FROM users WHERE active = true"
# Sorting and pagination
"products ORDER BY rating DESC LIMIT 20 OFFSET 40"
Documentation
- Complete Documentation: https://www.atonformat.com/documentation.html
- Technical Whitepaper: https://www.atonformat.com/whitepaper.html
- API Reference: Full API docs with examples
- GitHub Repository: https://github.com/dagoSte/aton-format
- PyPI Package: https://pypi.org/project/aton-format/
Use Cases
1. Multi-Agent Systems
# Real-time agent orchestration
encoder = ATONEncoder(compression=CompressionMode.FAST)
# Encode agent states (56% fewer tokens)
agent_state = encoder.encode({"agents": get_all_agents()})
# Query specific agents
active_agents = encoder.encode_with_query(
{"agents": agents},
"agents WHERE status = 'active' AND priority = 'high'"
)
2. E-commerce Product Search
# Large product catalog
products = load_product_catalog() # 100K products
# Filter and send only relevant products
results = encoder.encode_with_query(
{"products": products},
"""
products WHERE
category IN ('Electronics', 'Computers')
AND price BETWEEN 100 AND 500
AND name LIKE '%Premium%'
ORDER BY rating DESC
LIMIT 50
"""
)
# Only 50 relevant products sent to LLM!
3. Analytics Dashboard
# Stream large datasets
stream_encoder = ATONStreamEncoder(chunk_size=5000)
for chunk in stream_encoder.stream_encode(analytics_data):
# Process each chunk independently
# Memory stays constant even for millions of events
analyze_chunk(chunk['data'])
4. Customer Support (RAG)
# Conversation history with filtering
recent_chats = encoder.encode_with_query(
{"conversations": all_conversations},
"""
conversations WHERE
created_date > '2024-11-01'
AND sentiment != 'negative'
ORDER BY created_date DESC
LIMIT 100
"""
)
Error Handling
from aton_format import (
ATONEncoder,
ATONEncodingError,
ATONDecodingError,
ATONQueryError
)
try:
encoder = ATONEncoder(validate=True)
result = encoder.encode(data)
except ATONEncodingError as e:
print(f"Encoding failed: {e}")
# Handle encoding errors
except ATONQueryError as e:
print(f"Query failed: {e}")
# Handle query errors
except Exception as e:
print(f"Unexpected error: {e}")
# Handle unexpected errors
Best Practices
1. Choose the Right Compression Mode
# Real-time applications: Use FAST
encoder = ATONEncoder(compression=CompressionMode.FAST)
# General purpose: Use BALANCED (recommended)
encoder = ATONEncoder(compression=CompressionMode.BALANCED)
# Batch processing: Use ULTRA
encoder = ATONEncoder(compression=CompressionMode.ULTRA)
# Mixed workloads: Use ADAPTIVE
encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE)
2. Use Streaming for Large Datasets
# DON'T: Load everything in memory
data = load_million_records()
encoded = encoder.encode(data) # Memory spike!
# DO: Use streaming
stream_encoder = ATONStreamEncoder(chunk_size=1000)
for chunk in stream_encoder.stream_encode(data):
process_chunk(chunk) # Constant memory!
3. Leverage Query Language
# DON'T: Send all data then filter
all_data = encoder.encode(large_dataset)
# LLM has to process everything...
# DO: Filter before encoding
filtered = encoder.encode_with_query(
large_dataset,
"employees WHERE salary > 100000 LIMIT 100"
)
# Only 100 relevant records sent!
4. Always Validate
# Enable validation in production
encoder = ATONEncoder(validate=True)
decoder = ATONDecoder(validate=True)
# Test round-trip
encoded = encoder.encode(data)
decoded = decoder.decode(encoded)
assert data == decoded
Benchmarks
Run benchmarks on your data:
import time
from aton_format import ATONEncoder, CompressionMode
modes = [
CompressionMode.FAST,
CompressionMode.BALANCED,
CompressionMode.ULTRA,
CompressionMode.ADAPTIVE
]
for mode in modes:
encoder = ATONEncoder(compression=mode)
start = time.time()
result = encoder.encode(data)
duration = time.time() - start
print(f"{mode.value}:")
print(f" Time: {duration:.3f}s")
print(f" Size: {len(result)} chars")
print()
Migration from V1
ATON V2 is fully backward compatible with V1:
# V1 code (still works!)
encoder = ATONEncoder()
result = encoder.encode(data)
# V2 enhanced
encoder = ATONEncoder(
compression=CompressionMode.ADAPTIVE, # NEW
queryable=True, # NEW
validate=True
)
# Use new features
result = encoder.encode_with_query( # NEW
data,
"employees WHERE salary > 100000"
)
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Development Setup
# Clone repository
git clone https://github.com/dagoSte/aton-format.git
cd aton-format
# Install in development mode
pip install -e .[dev]
# Run tests
pytest tests/
# Run linting
flake8 src/
mypy src/
License
ATON Format is released under the MIT License.
MIT License
Copyright (c) 2025 Stefano D'Agostino
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Acknowledgments
- Inspired by the need for efficient LLM data serialization
- Built with focus on production-grade quality
- Community feedback helped shape V2 features
Contact
- Author: Stefano D'Agostino
- GitHub: @dagoSte
- Email: info@atonformat.com
- Website: https://www.atonformat.com
Star History
If you find ATON useful, please consider giving it a star on GitHub!
Made with love by Stefano D'Agostino
ATON V2.0 - Production-Grade Data Serialization for LLMs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aton_format-2.0.1.tar.gz.
File metadata
- Download URL: aton_format-2.0.1.tar.gz
- Upload date:
- Size: 63.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45d21b00da91c70e0a4151d18f45d2dbb8c5a5fba08f99aa6f28b2fd62b5e73e
|
|
| MD5 |
0adb7c0f24a768816c352cef528945bf
|
|
| BLAKE2b-256 |
64882d3e75a91cca2145ef9fd1293bdcfd6b87765d1cf3fc2c26a83b8ac89613
|
File details
Details for the file aton_format-2.0.1-py3-none-any.whl.
File metadata
- Download URL: aton_format-2.0.1-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07b884640efbf2b40ffd835a03ae6c66e34024f9b005f3fde407a1da8ebae30c
|
|
| MD5 |
0aedc96f256c7acd315044128bbaf072
|
|
| BLAKE2b-256 |
86a170e724e7664c80b78e5f7f2383397bb826c2f54e01d235a9ee7d3ffcc3e5
|