60x compression efficiency for AI communication through emergent language translation
Project description
Emergent Language Translator
High-performance binary encoding for AI agent communication
Emergent Language Translator compresses structured AI messages into a compact binary format using learned codebooks, common-key dictionaries, and zlib. Batch encoding amortizes header overhead across messages — the more you batch, the better the ratio.
Quick Start
pip install emergent-translator
from emergent_translator import BatchEncoder
encoder = BatchEncoder()
# Encode a batch of agent messages
messages = [
{"role": "user", "content": "analyze market trends", "priority": "high"},
{"role": "assistant", "content": "Starting analysis", "status": "active"},
{"role": "system", "content": "Agent coordinator online", "version": "1.0"},
]
result = encoder.encode_batch(messages)
print(f"{len(messages)} messages: 226 bytes JSON -> {len(result.payload)} bytes binary")
# 3 messages: 226 bytes JSON -> 141 bytes (38% reduction)
# Perfect round-trip reconstruction
decoded = encoder.decode_batch(result.payload)
assert decoded == messages
Compression Results
The batch encoder uses a binary wire format with common-key/value dictionaries and zlib compression. Efficiency improves with batch size:
| Workload | JSON Size | Encoded Size | Reduction |
|---|---|---|---|
| 3 agent messages | 226 bytes | 141 bytes | 38% |
| 10 agent messages | 750 bytes | 112 bytes | 85% |
| 50 agent messages | 4,880 bytes | 286 bytes | 94% |
Encoding speed: sub-millisecond (0.2ms typical).
Binary Format
All payloads start with magic bytes \xE7\xB0 followed by a version byte:
v1/v2: MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + PAYLOAD + CRC32(4)
v3: MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + CB_VERSION(2) + CB_LEN(2) + [CODEBOOK] + PAYLOAD + CRC32(4)
Common keys (role, content, action, status, priority, ...) and values (user, assistant, system, high, low, ...) are encoded as single-byte tokens. Remaining data is zlib-compressed.
Adaptive Codebooks
The static dictionaries cover common AI communication patterns. For domain-specific traffic, train a codebook that learns your most frequent keys and values:
from emergent_translator import AdaptiveCodebook, BatchEncoder
# Train on observed traffic
codebook = AdaptiveCodebook()
for msg in training_messages:
codebook.observe(msg)
codebook.rebuild(min_freq=5)
# Encode with learned codebook (v3 format, codebook embedded in payload)
encoder = BatchEncoder()
result = encoder.encode_batch(messages, codebook=codebook.active)
decoded = encoder.decode_batch(result.payload) # codebook auto-extracted
Train a codebook from synthetic data:
python scripts/train_codebook.py --messages 50000 --benchmark
Multi-Format Support
Parse and serialize 13+ formats, then compress through the batch encoder:
from emergent_translator import detect_format, get_handler, BatchEncoder
fmt = detect_format("data.csv") # "csv"
parse_fn, serialize_fn = get_handler(fmt)
records = parse_fn(open("data.csv").read())
encoder = BatchEncoder()
result = encoder.encode_batch(records)
Supported: JSON, CSV, JSONL, YAML, TOML, INI, XML, MessagePack, Protobuf, Parquet, Arrow, BSON, CBOR, XLSX.
GPU Batch Encoder
For higher throughput, use the GPU-accelerated encoder (falls back to CPU with ThreadPoolExecutor if CuPy is unavailable):
from emergent_translator import GPUBatchEncoder
gpu_encoder = GPUBatchEncoder(num_workers=8)
result = gpu_encoder.encode_batch(messages)
decoded = gpu_encoder.decode_batch(result.payload)
LLM Token Savings
Two complementary modules for reducing token usage with LLMs like Claude:
Code Skeletonization
Strip Python files to signatures + docstrings. Feed Claude the structure without paying for implementation lines:
from emergent_translator import skeletonize_file
result = skeletonize_file("my_module.py", focal=["important_func"])
print(f"{result.original_tokens} -> {result.skeleton_tokens} tokens "
f"({result.token_reduction_pct:.0f}% reduction)")
Claude Text Compression
Compress keys and values in structured data flowing through Claude API conversations:
from emergent_translator import ClaudeCompressor
compressor = ClaudeCompressor()
system = compressor.system_prompt_prefix() + "\n\nYour prompt..."
compressed_msgs = compressor.compress_messages(messages)
Project Structure
src/emergent_translator/ # pip-installable package
batch_encoder.py # v1 batch encoder (encode/decode)
gpu_batch_encoder.py # v2 GPU-accelerated encoder
adaptive_codebook.py # v3 learned codebooks
format_handlers.py # 13+ format parsers
emergent_symbols.py # symbol encoder
api_server.py # FastAPI server
cli.py # CLI tool
scripts/ # benchmarks, stress tests, workers
tests/ # 535 tests
Development
git clone https://github.com/maco144/emergent-language
cd emergent-language
pip install -e ".[dev,formats]"
python -m pytest tests/ -v
Docker
docker build -t emergent-translator .
docker run -p 8000:8000 emergent-translator
License
GPL-3.0-or-later. See LICENSE for details.
Commercial licensing available — see COMMERCIAL_LICENSE.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emergent_translator-1.1.1.tar.gz.
File metadata
- Download URL: emergent_translator-1.1.1.tar.gz
- Upload date:
- Size: 152.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f48f00924b0d6cddebfb6af696ebeadf14577668f5f71c98c004e7433cf0b033
|
|
| MD5 |
2289667388f34420798a53a8cccce8d4
|
|
| BLAKE2b-256 |
508ddeef2e6012fbdecacc64818ce17346e333f29dbee60740bda283bf4dd715
|
File details
Details for the file emergent_translator-1.1.1-py3-none-any.whl.
File metadata
- Download URL: emergent_translator-1.1.1-py3-none-any.whl
- Upload date:
- Size: 127.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32617045b47d2e0bbce112d7bce2b661302c359dc28ad4a306b59f8466feef62
|
|
| MD5 |
c7b2711f446ea940676537a85a600687
|
|
| BLAKE2b-256 |
647694a0c95341d93211f6eb97feb9d5af724c5dc2a9e84e89145997c81c5c19
|