Advanced data serialization achieving 50-60% token reduction with Compression Modes, SQL-like Query Language, and Streaming Support for LLMs

These details have not been verified by PyPI

Project links

Project description

ATON V2.0 - Adaptive Token-Oriented Notation

Production-grade data serialization achieving 50-60% token reduction for Large Language Models with advanced compression, SQL-like query language, and streaming support.

What's New in V2.0

Advanced Compression System

4 Compression Modes: Fast, Balanced, Ultra, and Adaptive
Intelligent Algorithms: Dictionary, Delta encoding, Pattern recognition
AI-Driven Selection: Adaptive mode analyzes data and chooses optimal strategy

SQL-like Query Language

Full AST Parser: Not simplified regex, real production parser
Complete Operators: =, !=, <, >, <=, >=, IN, LIKE, BETWEEN
Complex Logic: AND, OR, NOT with nested conditions
Advanced Features: SELECT, ORDER BY, LIMIT, OFFSET

Streaming Support

Memory Efficient: Process millions of records with constant memory
Schema Caching: First chunk includes schema, subsequent chunks only data
Progress Tracking: Monitor encoding progress
Configurable Chunks: Adjust chunk size for your use case

Production-Grade Quality

Custom Exceptions: Complete error hierarchy
Full Validation: Input validation, type checking, boundary conditions
Type Safety: 100% type hints coverage
Zero Compromises: Professional code quality throughout

Performance

Token Reduction vs JSON

Dataset	JSON Tokens	ATON Tokens	Reduction
Employee Records (1K)	12,450	5,280	57.6%
Product Catalog (10K)	145,200	64,800	55.4%
Transaction Log (100K)	1,856,000	815,000	56.1%
Real-time Agents	42,000	18,500	56.0%

Compression Speed

Mode	Records/sec	Latency	Best For
FAST	~50,000	<1ms	Real-time applications
BALANCED	~35,000	<2ms	General purpose (recommended)
ULTRA	~20,000	<5ms	Batch processing, storage
ADAPTIVE	~30,000	<3ms	Mixed workloads

Real-World Cost Savings

Use Cases

Application	Records/Hour	Annual API Savings
Multi-Agent Systems	1,000 updates	$609,120
E-commerce Platform	100K products	$182,500
Analytics Dashboard	Millions of events	$1,460,000
Customer Support (RAG)	Thousands of chats	$584,930
Financial Trading	Real-time ticks	$2,920,000
Healthcare Records	Patient data	$730,000

Installation

Python

pip install aton-format

JavaScript

npm install aton-format
# or
yarn add aton-format

From Source

git clone https://github.com/dagoSte/aton-format.git
cd aton-format
pip install -e .

Quick Start

Basic Usage (Python)

from aton_format import ATONEncoder, ATONDecoder, CompressionMode

# Initialize encoder with compression
encoder = ATONEncoder(
    compression=CompressionMode.BALANCED,  # or FAST, ULTRA, ADAPTIVE
    optimize=True
)

# Your data
data = {
    "employees": [
        {"id": 1, "name": "Alice", "salary": 95000, "active": True},
        {"id": 2, "name": "Bob", "salary": 92000, "active": True},
        {"id": 3, "name": "Carol", "salary": 110000, "active": False}
    ]
}

# Encode to ATON (50-60% fewer tokens!)
aton_text = encoder.encode(data)
print(aton_text)

# Output:
# @schema[id:int, name:str, salary:int, active:bool]
# @defaults[active=true]
#
# employees(3):
#   1, "Alice", 95000, null
#   2, "Bob", 92000, null
#   3, "Carol", 110000, false

# Decode back to original
decoder = ATONDecoder()
original = decoder.decode(aton_text)

assert data == original  # Perfect round-trip!

Basic Usage (JavaScript)

const ATON = require('aton-format');

// Initialize encoder
const encoder = new ATON.Encoder({
    compression: ATON.CompressionMode.BALANCED,
    optimize: true
});

// Your data
const data = {
    employees: [
        {id: 1, name: "Alice", salary: 95000, active: true},
        {id: 2, name: "Bob", salary: 92000, active: true},
        {id: 3, name: "Carol", salary: 110000, active: false}
    ]
};

// Encode to ATON
const atonText = encoder.encode(data);
console.log(atonText);

// Decode back
const decoder = new ATON.Decoder();
const original = decoder.decode(atonText);

Advanced Features

1. Query Language

Filter and transform data before encoding:

from aton_format import ATONEncoder, CompressionMode

encoder = ATONEncoder(
    compression=CompressionMode.ADAPTIVE,
    queryable=True
)

# Complex query with SQL-like syntax
result = encoder.encode_with_query(
    data,
    """
    employees WHERE
        (salary > 100000 AND role = 'Engineer')
        OR department = 'Executive'
    ORDER BY salary DESC
    LIMIT 20
    """
)

# Only relevant records encoded!
# Massive token savings for large datasets

2. Streaming for Large Datasets

Process millions of records with constant memory:

from aton_format import ATONStreamEncoder, CompressionMode

# Initialize stream encoder
stream_encoder = ATONStreamEncoder(
    chunk_size=1000,
    compression=CompressionMode.ULTRA
)

# Large dataset
data = {"transactions": large_transaction_list}

# Stream encode in chunks
for chunk in stream_encoder.stream_encode(data):
    print(f"Chunk {chunk['chunk_id']}/{chunk['total_chunks']}")
    print(f"Records: {chunk['metadata']['records_in_chunk']}")

    # Process chunk (send to LLM, store, etc.)
    process_chunk(chunk['data'])

    # Memory stays constant!

3. Compression Modes

Choose the right strategy for your use case:

from aton_format import ATONEncoder, CompressionMode

# Fast mode - optimized for speed
encoder = ATONEncoder(compression=CompressionMode.FAST)

# Balanced mode - optimal mix (RECOMMENDED)
encoder = ATONEncoder(compression=CompressionMode.BALANCED)

# Ultra mode - maximum compression
encoder = ATONEncoder(compression=CompressionMode.ULTRA)

# Adaptive mode - AI-driven selection
encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE)

4. Query Operators

Full SQL-like query support:

# Comparison operators
"products WHERE price > 100"
"employees WHERE age < 30"
"orders WHERE total >= 1000"

# Special operators
"users WHERE status IN ('active', 'pending')"
"products WHERE name LIKE '%Premium%'"
"orders WHERE total BETWEEN 100 AND 500"

# Complex logic
"employees WHERE (role = 'Engineer' OR role = 'Manager') AND salary > 80000"

# Field selection
"SELECT name, email FROM users WHERE active = true"

# Sorting and pagination
"products ORDER BY rating DESC LIMIT 20 OFFSET 40"

Documentation

Complete Documentation: https://www.atonformat.com/documentation.html
Technical Whitepaper: https://www.atonformat.com/whitepaper.html
API Reference: Full API docs with examples
GitHub Repository: https://github.com/dagoSte/aton-format
PyPI Package: https://pypi.org/project/aton-format/

Use Cases

1. Multi-Agent Systems

# Real-time agent orchestration
encoder = ATONEncoder(compression=CompressionMode.FAST)

# Encode agent states (56% fewer tokens)
agent_state = encoder.encode({"agents": get_all_agents()})

# Query specific agents
active_agents = encoder.encode_with_query(
    {"agents": agents},
    "agents WHERE status = 'active' AND priority = 'high'"
)

2. E-commerce Product Search

# Large product catalog
products = load_product_catalog()  # 100K products

# Filter and send only relevant products
results = encoder.encode_with_query(
    {"products": products},
    """
    products WHERE
        category IN ('Electronics', 'Computers')
        AND price BETWEEN 100 AND 500
        AND name LIKE '%Premium%'
    ORDER BY rating DESC
    LIMIT 50
    """
)
# Only 50 relevant products sent to LLM!

3. Analytics Dashboard

# Stream large datasets
stream_encoder = ATONStreamEncoder(chunk_size=5000)

for chunk in stream_encoder.stream_encode(analytics_data):
    # Process each chunk independently
    # Memory stays constant even for millions of events
    analyze_chunk(chunk['data'])

4. Customer Support (RAG)

# Conversation history with filtering
recent_chats = encoder.encode_with_query(
    {"conversations": all_conversations},
    """
    conversations WHERE
        created_date > '2024-11-01'
        AND sentiment != 'negative'
    ORDER BY created_date DESC
    LIMIT 100
    """
)

Error Handling

from aton_format import (
    ATONEncoder,
    ATONEncodingError,
    ATONDecodingError,
    ATONQueryError
)

try:
    encoder = ATONEncoder(validate=True)
    result = encoder.encode(data)

except ATONEncodingError as e:
    print(f"Encoding failed: {e}")
    # Handle encoding errors

except ATONQueryError as e:
    print(f"Query failed: {e}")
    # Handle query errors

except Exception as e:
    print(f"Unexpected error: {e}")
    # Handle unexpected errors

Best Practices

1. Choose the Right Compression Mode

# Real-time applications: Use FAST
encoder = ATONEncoder(compression=CompressionMode.FAST)

# General purpose: Use BALANCED (recommended)
encoder = ATONEncoder(compression=CompressionMode.BALANCED)

# Batch processing: Use ULTRA
encoder = ATONEncoder(compression=CompressionMode.ULTRA)

# Mixed workloads: Use ADAPTIVE
encoder = ATONEncoder(compression=CompressionMode.ADAPTIVE)

2. Use Streaming for Large Datasets

# DON'T: Load everything in memory
data = load_million_records()
encoded = encoder.encode(data)  # Memory spike!

# DO: Use streaming
stream_encoder = ATONStreamEncoder(chunk_size=1000)
for chunk in stream_encoder.stream_encode(data):
    process_chunk(chunk)  # Constant memory!

3. Leverage Query Language

# DON'T: Send all data then filter
all_data = encoder.encode(large_dataset)
# LLM has to process everything...

# DO: Filter before encoding
filtered = encoder.encode_with_query(
    large_dataset,
    "employees WHERE salary > 100000 LIMIT 100"
)
# Only 100 relevant records sent!

4. Always Validate

# Enable validation in production
encoder = ATONEncoder(validate=True)
decoder = ATONDecoder(validate=True)

# Test round-trip
encoded = encoder.encode(data)
decoded = decoder.decode(encoded)
assert data == decoded

Benchmarks

Run benchmarks on your data:

import time
from aton_format import ATONEncoder, CompressionMode

modes = [
    CompressionMode.FAST,
    CompressionMode.BALANCED,
    CompressionMode.ULTRA,
    CompressionMode.ADAPTIVE
]

for mode in modes:
    encoder = ATONEncoder(compression=mode)

    start = time.time()
    result = encoder.encode(data)
    duration = time.time() - start

    print(f"{mode.value}:")
    print(f"  Time: {duration:.3f}s")
    print(f"  Size: {len(result)} chars")
    print()

Migration from V1

ATON V2 is fully backward compatible with V1:

# V1 code (still works!)
encoder = ATONEncoder()
result = encoder.encode(data)

# V2 enhanced
encoder = ATONEncoder(
    compression=CompressionMode.ADAPTIVE,  # NEW
    queryable=True,                        # NEW
    validate=True
)

# Use new features
result = encoder.encode_with_query(      # NEW
    data,
    "employees WHERE salary > 100000"
)

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone repository
git clone https://github.com/dagoSte/aton-format.git
cd aton-format

# Install in development mode
pip install -e .[dev]

# Run tests
pytest tests/

# Run linting
flake8 src/
mypy src/

License

ATON Format is released under the MIT License.

MIT License

Copyright (c) 2025 Stefano D'Agostino

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Acknowledgments

Inspired by the need for efficient LLM data serialization
Built with focus on production-grade quality
Community feedback helped shape V2 features

Contact

Author: Stefano D'Agostino
GitHub: @dagoSte
Email: info@atonformat.com
Website: https://www.atonformat.com

Star History

If you find ATON useful, please consider giving it a star on GitHub!

Made with love by Stefano D'Agostino

ATON V2.0 - Production-Grade Data Serialization for LLMs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.1

Nov 22, 2025

1.0.2

Nov 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aton_format-2.0.1.tar.gz (63.3 kB view details)

Uploaded Nov 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aton_format-2.0.1-py3-none-any.whl (24.8 kB view details)

Uploaded Nov 22, 2025 Python 3

File details

Details for the file aton_format-2.0.1.tar.gz.

File metadata

Download URL: aton_format-2.0.1.tar.gz
Upload date: Nov 22, 2025
Size: 63.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for aton_format-2.0.1.tar.gz
Algorithm	Hash digest
SHA256	`45d21b00da91c70e0a4151d18f45d2dbb8c5a5fba08f99aa6f28b2fd62b5e73e`
MD5	`0adb7c0f24a768816c352cef528945bf`
BLAKE2b-256	`64882d3e75a91cca2145ef9fd1293bdcfd6b87765d1cf3fc2c26a83b8ac89613`

See more details on using hashes here.

File details

Details for the file aton_format-2.0.1-py3-none-any.whl.

File metadata

Download URL: aton_format-2.0.1-py3-none-any.whl
Upload date: Nov 22, 2025
Size: 24.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for aton_format-2.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`07b884640efbf2b40ffd835a03ae6c66e34024f9b005f3fde407a1da8ebae30c`
MD5	`0aedc96f256c7acd315044128bbaf072`
BLAKE2b-256	`86a170e724e7664c80b78e5f7f2383397bb826c2f54e01d235a9ee7d3ffcc3e5`

See more details on using hashes here.

aton-format 2.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ATON V2.0 - Adaptive Token-Oriented Notation

What's New in V2.0

Advanced Compression System

SQL-like Query Language

Streaming Support

Production-Grade Quality

Performance

Token Reduction vs JSON

Compression Speed

Real-World Cost Savings

Use Cases

Installation

Python

JavaScript

From Source

Quick Start

Basic Usage (Python)

Basic Usage (JavaScript)

Advanced Features

1. Query Language

2. Streaming for Large Datasets

3. Compression Modes

4. Query Operators

Documentation

Use Cases

1. Multi-Agent Systems

2. E-commerce Product Search

3. Analytics Dashboard

4. Customer Support (RAG)

Error Handling

Best Practices

1. Choose the Right Compression Mode

2. Use Streaming for Large Datasets

3. Leverage Query Language

4. Always Validate

Benchmarks

Migration from V1

Contributing

Development Setup

License

Acknowledgments

Contact

Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes