Skip to main content

60x compression efficiency for AI communication through emergent language translation

Project description

Emergent Language Translator

High-performance binary encoding for AI agent communication

PyPI Python Tests License

Emergent Language Translator compresses structured AI messages into a compact binary format using learned codebooks, common-key dictionaries, and zlib. Batch encoding amortizes header overhead across messages — the more you batch, the better the ratio.

Quick Start

pip install emergent-translator              # Core encoding (zero deps)
pip install emergent-translator[sdk]         # + SDK/CLI (adds httpx, websockets)
pip install emergent-translator[server]      # + API server (adds fastapi, uvicorn, ...)
pip install emergent-translator[caas]        # + CaaS with LLM eval (adds openai)
pip install emergent-translator[all]         # Everything

The base package has zero dependencies — just the stdlib. Heavy packages (fastapi, uvicorn, httpx, openai) are only installed when you need them.

The package works in two modes: local encoding (no server needed) and API mode (via hosted service or self-hosted). Nothing spawns in the background — all components are opt-in.

For Integration Authors

The core encoding modules (BatchEncoder, GPUBatchEncoder, AdaptiveCodebook, LocalPipelineSync) use only the Python standard library. Adding emergent-translator as a dependency to your NATS codec, AutoGen transform, or LiteLLM hook will not pull in FastAPI, uvicorn, or any other heavy packages.

Local Encoding (no server required)

Encode and decode directly in your process. No network calls, no API keys, no setup:

from emergent_translator import BatchEncoder

encoder = BatchEncoder()

# Encode a batch of agent messages
messages = [
    {"role": "user", "content": "analyze market trends", "priority": "high"},
    {"role": "assistant", "content": "Starting analysis", "status": "active"},
    {"role": "system", "content": "Agent coordinator online", "version": "1.0"},
]

result = encoder.encode_batch(messages)
print(f"{len(messages)} messages: 226 bytes JSON -> {len(result.payload)} bytes binary")
# 3 messages: 226 bytes JSON -> 141 bytes (38% reduction)

# Decode back to original dicts
decoded = encoder.decode_batch(result.payload)
assert decoded == messages

Pipeline Mode (auto-batching)

LocalPipelineSync wraps the encoder, adaptive codebook, and a background collector into a single context manager:

from emergent_translator import LocalPipelineSync

with LocalPipelineSync(adaptive=True, codebook_path="codebook.json") as p:
    result = p.encode(messages)           # direct batch encode
    p.add(msg); p.drain()                 # auto-batched collector
    for batch in p.encode_stream(iter(messages), batch_size=50):
        print(batch.compressed_bytes)     # streaming

API Mode (hosted service)

A public API is available for server-side compression with additional features (oracle explanations, validation, statistics):

from emergent_translator import TranslatorSDK

sdk = TranslatorSDK(api_url="http://149.28.33.118:8001")

# Compress
compressed = sdk.compress({"role": "user", "content": "hello world"})

# Decompress
original = sdk.decompress(compressed, format="json")

# Health check
print(sdk.get_health())

Compression Results

The batch encoder uses a binary wire format with common-key/value dictionaries and zlib compression. Efficiency improves with batch size:

Workload JSON Size Encoded Size Reduction
3 agent messages 226 bytes 141 bytes 38%
10 agent messages 750 bytes 112 bytes 85%
50 agent messages 4,880 bytes 286 bytes 94%

Encoding speed: sub-millisecond (0.2ms typical).

Encoding and Decoding

Every encoder includes a matching decoder. All encoding is fully reversible:

from emergent_translator import BatchEncoder

encoder = BatchEncoder()

# Single message encode/decode
messages = [{"role": "user", "content": "hello"}]
encoded = encoder.encode_batch(messages)
decoded = encoder.decode_batch(encoded.payload)
assert decoded == messages

# The payload is self-describing — version, flags, and CRC are embedded
# No external state needed to decode

The GPU encoder works the same way:

from emergent_translator import GPUBatchEncoder

gpu_encoder = GPUBatchEncoder(num_workers=8)
result = gpu_encoder.encode_batch(messages)
decoded = gpu_encoder.decode_batch(result.payload)

v3 adaptive codebook payloads embed the codebook in the binary, so decoding requires no extra arguments:

result = encoder.encode_batch(messages, codebook=codebook.active)
decoded = encoder.decode_batch(result.payload)  # codebook auto-extracted

CLI

The package installs an emergent-translator command:

# Check the public API
emergent-translator health

# Compress a JSON file (sends to API, returns binary)
emergent-translator compress data.json

# Compress from stdin
echo '{"message": "hello"}' | emergent-translator compress -

# Decompress a .compressed file back to JSON
emergent-translator decompress data.compressed

# Decompress to a specific format
emergent-translator decompress data.compressed -o output.yaml -f yaml

# Run a compression benchmark
emergent-translator benchmark --size 1000 --iterations 10

# Get API statistics
emergent-translator stats

# Verbose output (shows timing, sizes, efficiency)
emergent-translator compress data.json -v

CLI defaults:

  • API URL: http://149.28.33.118:8001 (public hosted instance)
  • API key: eudaimonia-translator-demo (demo token, no signup needed)
  • Override with --api-url and --api-key
# Point CLI at a local server instead
emergent-translator --api-url http://localhost:8000 health

# Use a custom API key
emergent-translator --api-key my-token compress data.json

Python SDK

The TranslatorSDK provides a sync interface for API-based compression with oracle features. The EmergentTranslatorClient provides an async interface.

TranslatorSDK (sync)

from emergent_translator import TranslatorSDK

# Connect to the public API (demo token, no signup)
sdk = TranslatorSDK(api_url="http://149.28.33.118:8001")

# Or connect to a local server
sdk = TranslatorSDK(api_url="http://localhost:8000", auth_token="my-token")

# Compress / decompress
compressed = sdk.compress({"task": "analyze", "priority": "high"})
original = sdk.decompress(compressed, format="json")

# Health and stats
sdk.is_healthy()              # True/False
sdk.get_health()              # Full health dict
sdk.get_stats()               # Compression statistics

# Oracle features (requires API)
explanation = sdk.explain(compressed)                        # Human-readable explanation
score = sdk.validate(original_data, compressed)              # Accuracy score (0.0-1.0)
families = sdk.get_symbol_info(compressed)                   # Symbol family breakdown
compressed, explanation = sdk.compress_with_explanation(data) # Both in one call

EmergentTranslatorClient (async)

from emergent_translator import EmergentTranslatorClient, TranslationConfig

config = TranslationConfig(
    api_base_url="http://149.28.33.118:8001",
    auth_token="eudaimonia-translator-demo",  # optional, demo token
    timeout=30.0,
    max_retries=3,
)

async with EmergentTranslatorClient(config) as client:
    result = await client.translate_json({"task": "analyze data"})
    print(f"Compression: {result.efficiency_gain:.1f}%")

Convenience Functions

from emergent_translator import compress_json, decompress_to_json

compressed = compress_json({"key": "value"}, api_url="http://149.28.33.118:8001")
original = decompress_to_json(compressed, api_url="http://149.28.33.118:8001")

Local vs API — When to Use Which

Local (BatchEncoder) Pipeline (LocalPipelineSync) API (TranslatorSDK / CLI)
Setup pip install only pip install only Needs running server (public or self-hosted)
Speed Sub-millisecond Sub-millisecond Network round-trip
Dependencies Zero (stdlib only) Zero (stdlib only) pip install emergent-translator[sdk]
Decoding Built-in Built-in Built-in
Auto-batching No Yes (background flush) No
Adaptive codebook Manual Built-in (auto-rebuild) No
Codebook persistence Manual Built-in (codebook_path) No
Streaming No encode_stream WebSocket
Distributed processing No Opt-in (workers=[...]) No
Oracle explanations No No Yes
Validation No No Yes
Statistics/metrics No get_stats() Yes
Best for Simple encode/decode Production pipelines, high throughput Full-featured service, debugging, monitoring

Binary Format

All payloads start with magic bytes \xE7\xB0 followed by a version byte:

v1/v2: MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + PAYLOAD + CRC32(4)
v3:    MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + CB_VERSION(2) + CB_LEN(2) + [CODEBOOK] + PAYLOAD + CRC32(4)

Common keys (role, content, action, status, priority, ...) and values (user, assistant, system, high, low, ...) are encoded as single-byte tokens. Remaining data is zlib-compressed.

Adaptive Codebooks

The static dictionaries cover common AI communication patterns. For domain-specific traffic, train a codebook that learns your most frequent keys and values:

from emergent_translator import AdaptiveCodebook, BatchEncoder

# Train on observed traffic
codebook = AdaptiveCodebook()
for msg in training_messages:
    codebook.observe(msg)
codebook.rebuild(min_freq=5)

# Encode with learned codebook (v3 format, codebook embedded in payload)
encoder = BatchEncoder()
result = encoder.encode_batch(messages, codebook=codebook.active)
decoded = encoder.decode_batch(result.payload)  # codebook auto-extracted

Train a codebook from synthetic data:

python scripts/train_codebook.py --messages 50000 --benchmark

Multi-Format Support

Parse and serialize 13+ formats, then compress through the batch encoder:

from emergent_translator import detect_format, get_handler, BatchEncoder

fmt = detect_format("data.csv")          # "csv"
parse_fn, serialize_fn = get_handler(fmt)
records = parse_fn(open("data.csv").read())

encoder = BatchEncoder()
result = encoder.encode_batch(records)

Supported: JSON, CSV, JSONL, YAML, TOML, INI, XML, MessagePack, Protobuf, Parquet, Arrow, BSON, CBOR, XLSX.

GPU Batch Encoder

For higher throughput, use the GPU-accelerated encoder (falls back to CPU with ThreadPoolExecutor if CuPy is unavailable):

from emergent_translator import GPUBatchEncoder

gpu_encoder = GPUBatchEncoder(num_workers=8)
result = gpu_encoder.encode_batch(messages)
decoded = gpu_encoder.decode_batch(result.payload)

LLM Token Savings

Two complementary modules for reducing token usage with LLMs like Claude:

Code Skeletonization

Strip Python files to signatures + docstrings. Feed Claude the structure without paying for implementation lines:

from emergent_translator import skeletonize_file

result = skeletonize_file("my_module.py", focal=["important_func"])
print(f"{result.original_tokens} -> {result.skeleton_tokens} tokens "
      f"({result.token_reduction_pct:.0f}% reduction)")

Claude Text Compression

Compress keys and values in structured data flowing through Claude API conversations:

from emergent_translator import ClaudeCompressor

compressor = ClaudeCompressor()
system = compressor.system_prompt_prefix() + "\n\nYour prompt..."
compressed_msgs = compressor.compress_messages(messages)

Self-Hosting the API

Run your own instance instead of using the public API:

# From source
pip install -r requirements.txt
uvicorn src.emergent_translator.api_server:app --port 8000

# Docker
docker build -t emergent-translator .
docker run -p 8000:8000 emergent-translator

Then point the SDK or CLI at it:

sdk = TranslatorSDK(api_url="http://localhost:8000")
emergent-translator --api-url http://localhost:8000 health

API Endpoints

Endpoint Purpose
POST /translate Compress JSON/text to emergent symbols
POST /translate/batch Batch compression
POST /oracle/explain Human-readable explanation of compressed data
POST /oracle/validate Validate translation accuracy
WebSocket /ws/translate Real-time streaming compression
GET /health Health check
GET /metrics Prometheus metrics

Environment Variables

Variable Default Purpose
API_AUTH_TOKEN eudaimonia-translator-demo Bearer token for authentication
CACHE_MAX_SIZE 10000 Translation cache entries
CACHE_TTL_SECONDS 300 Cache TTL in seconds
ENVIRONMENT development Set to production for structured logging
PIPELINE_CODEBOOK_PATH (none) File path for adaptive codebook persistence in /translate/batch pipeline. When set, enables adaptive codebook and saves/loads learned patterns across restarts.

Project Structure

src/emergent_translator/    # pip-installable package
  batch_encoder.py          # v1 batch encoder (encode/decode)
  gpu_batch_encoder.py      # v2 GPU-accelerated encoder
  adaptive_codebook.py      # v3 learned codebooks
  format_handlers.py        # 13+ format parsers
  emergent_symbols.py       # symbol encoder
  local_pipeline.py         # LocalPipeline / LocalPipelineSync
  lazy_collector.py         # LocalLazyCollector (auto-batching)
  api_server.py             # FastAPI server
  client_sdk.py             # Python SDK (sync + async clients)
  cli.py                    # CLI tool
scripts/                    # benchmarks, stress tests, workers
tests/                      # 535 tests

Development

git clone https://github.com/maco144/emergent-language
cd emergent-language
pip install -e ".[all,dev]"
python -m pytest tests/ -v

License

GPL-3.0-or-later. See LICENSE for details.

Commercial licensing available — see COMMERCIAL_LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emergent_translator-1.3.0.tar.gz (193.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

emergent_translator-1.3.0-py3-none-any.whl (178.1 kB view details)

Uploaded Python 3

File details

Details for the file emergent_translator-1.3.0.tar.gz.

File metadata

  • Download URL: emergent_translator-1.3.0.tar.gz
  • Upload date:
  • Size: 193.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for emergent_translator-1.3.0.tar.gz
Algorithm Hash digest
SHA256 f5717b8989f5ddb14be900fcc85fdf0607d11ac47410972004bd715bd89edcba
MD5 9f7712ce7cd23b3a5cec975b2735f76d
BLAKE2b-256 2b888df09df6e509d5b2af42608b1c323c1164e9643fd75a8a4858628b7e5165

See more details on using hashes here.

File details

Details for the file emergent_translator-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for emergent_translator-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e4efca41fbee06e16d912635253502dd1d7543a2a3374e09d8a8de8326a57f3
MD5 7e325a122b008d5d642015cfd9c69ad6
BLAKE2b-256 2706266374566823849641c417373e36385839c34ee96b75833707fed2d8eacb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page