60x compression efficiency for AI communication through emergent language translation

These details have not been verified by PyPI

Project links

Project description

Emergent Language Translator

High-performance binary encoding for AI agent communication

Emergent Language Translator compresses structured AI messages into a compact binary format using learned codebooks, common-key dictionaries, and zlib. Batch encoding amortizes header overhead across messages — the more you batch, the better the ratio.

Quick Start

pip install emergent-translator              # Core encoding (zero deps)
pip install emergent-translator[sdk]         # + SDK/CLI (adds httpx, websockets)
pip install emergent-translator[server]      # + API server (adds fastapi, uvicorn, ...)
pip install emergent-translator[caas]        # + CaaS with LLM eval (adds openai)
pip install emergent-translator[all]         # Everything

The base package has zero dependencies — just the stdlib. Heavy packages (fastapi, uvicorn, httpx, openai) are only installed when you need them.

The package works in two modes: local encoding (no server needed) and API mode (via hosted service or self-hosted). Nothing spawns in the background — all components are opt-in.

For Integration Authors

The core encoding modules (BatchEncoder, GPUBatchEncoder, AdaptiveCodebook, LocalPipelineSync) use only the Python standard library. Adding emergent-translator as a dependency to your NATS codec, AutoGen transform, or LiteLLM hook will not pull in FastAPI, uvicorn, or any other heavy packages.

Local Encoding (no server required)

Encode and decode directly in your process. No network calls, no API keys, no setup:

from emergent_translator import BatchEncoder

encoder = BatchEncoder()

# Encode a batch of agent messages
messages = [
    {"role": "user", "content": "analyze market trends", "priority": "high"},
    {"role": "assistant", "content": "Starting analysis", "status": "active"},
    {"role": "system", "content": "Agent coordinator online", "version": "1.0"},
]

result = encoder.encode_batch(messages)
print(f"{len(messages)} messages: 226 bytes JSON -> {len(result.payload)} bytes binary")
# 3 messages: 226 bytes JSON -> 141 bytes (38% reduction)

# Decode back to original dicts
decoded = encoder.decode_batch(result.payload)
assert decoded == messages

Pipeline Mode (auto-batching)

LocalPipelineSync wraps the encoder, adaptive codebook, and a background collector into a single context manager:

from emergent_translator import LocalPipelineSync

with LocalPipelineSync(adaptive=True, codebook_path="codebook.json") as p:
    result = p.encode(messages)           # direct batch encode
    p.add(msg); p.drain()                 # auto-batched collector
    for batch in p.encode_stream(iter(messages), batch_size=50):
        print(batch.compressed_bytes)     # streaming

API Mode (hosted service)

A public API is available for server-side compression with additional features (oracle explanations, validation, statistics):

from emergent_translator import TranslatorSDK

sdk = TranslatorSDK(api_url="http://149.28.33.118:8001")

# Compress
compressed = sdk.compress({"role": "user", "content": "hello world"})

# Decompress
original = sdk.decompress(compressed, format="json")

# Health check
print(sdk.get_health())

Compression Results

The batch encoder uses a binary wire format with common-key/value dictionaries and zlib compression. Efficiency improves with batch size:

Workload	JSON Size	Encoded Size	Reduction
3 agent messages	226 bytes	141 bytes	38%
10 agent messages	750 bytes	112 bytes	85%
50 agent messages	4,880 bytes	286 bytes	94%

Encoding speed: sub-millisecond (0.2ms typical).

Encoding and Decoding

Every encoder includes a matching decoder. All encoding is fully reversible:

from emergent_translator import BatchEncoder

encoder = BatchEncoder()

# Single message encode/decode
messages = [{"role": "user", "content": "hello"}]
encoded = encoder.encode_batch(messages)
decoded = encoder.decode_batch(encoded.payload)
assert decoded == messages

# The payload is self-describing — version, flags, and CRC are embedded
# No external state needed to decode

The GPU encoder works the same way:

from emergent_translator import GPUBatchEncoder

gpu_encoder = GPUBatchEncoder(num_workers=8)
result = gpu_encoder.encode_batch(messages)
decoded = gpu_encoder.decode_batch(result.payload)

v3 adaptive codebook payloads embed the codebook in the binary, so decoding requires no extra arguments:

result = encoder.encode_batch(messages, codebook=codebook.active)
decoded = encoder.decode_batch(result.payload)  # codebook auto-extracted

CLI

The package installs an emergent-translator command:

# Check the public API
emergent-translator health

# Compress a JSON file (sends to API, returns binary)
emergent-translator compress data.json

# Compress from stdin
echo '{"message": "hello"}' | emergent-translator compress -

# Decompress a .compressed file back to JSON
emergent-translator decompress data.compressed

# Decompress to a specific format
emergent-translator decompress data.compressed -o output.yaml -f yaml

# Run a compression benchmark
emergent-translator benchmark --size 1000 --iterations 10

# Get API statistics
emergent-translator stats

# Verbose output (shows timing, sizes, efficiency)
emergent-translator compress data.json -v

CLI defaults:

API URL: http://149.28.33.118:8001 (public hosted instance)
API key: eudaimonia-translator-demo (demo token, no signup needed)
Override with --api-url and --api-key

# Point CLI at a local server instead
emergent-translator --api-url http://localhost:8000 health

# Use a custom API key
emergent-translator --api-key my-token compress data.json

Python SDK

The TranslatorSDK provides a sync interface for API-based compression with oracle features. The EmergentTranslatorClient provides an async interface.

TranslatorSDK (sync)

from emergent_translator import TranslatorSDK

# Connect to the public API (demo token, no signup)
sdk = TranslatorSDK(api_url="http://149.28.33.118:8001")

# Or connect to a local server
sdk = TranslatorSDK(api_url="http://localhost:8000", auth_token="my-token")

# Compress / decompress
compressed = sdk.compress({"task": "analyze", "priority": "high"})
original = sdk.decompress(compressed, format="json")

# Health and stats
sdk.is_healthy()              # True/False
sdk.get_health()              # Full health dict
sdk.get_stats()               # Compression statistics

# Oracle features (requires API)
explanation = sdk.explain(compressed)                        # Human-readable explanation
score = sdk.validate(original_data, compressed)              # Accuracy score (0.0-1.0)
families = sdk.get_symbol_info(compressed)                   # Symbol family breakdown
compressed, explanation = sdk.compress_with_explanation(data) # Both in one call

EmergentTranslatorClient (async)

from emergent_translator import EmergentTranslatorClient, TranslationConfig

config = TranslationConfig(
    api_base_url="http://149.28.33.118:8001",
    auth_token="eudaimonia-translator-demo",  # optional, demo token
    timeout=30.0,
    max_retries=3,
)

async with EmergentTranslatorClient(config) as client:
    result = await client.translate_json({"task": "analyze data"})
    print(f"Compression: {result.efficiency_gain:.1f}%")

Convenience Functions

from emergent_translator import compress_json, decompress_to_json

compressed = compress_json({"key": "value"}, api_url="http://149.28.33.118:8001")
original = decompress_to_json(compressed, api_url="http://149.28.33.118:8001")

Local vs API — When to Use Which

	Local (`BatchEncoder`)	Pipeline (`LocalPipelineSync`)	API (`TranslatorSDK` / CLI)
Setup	`pip install` only	`pip install` only	Needs running server (public or self-hosted)
Speed	Sub-millisecond	Sub-millisecond	Network round-trip
Dependencies	Zero (stdlib only)	Zero (stdlib only)	`pip install emergent-translator[sdk]`
Decoding	Built-in	Built-in	Built-in
Auto-batching	No	Yes (background flush)	No
Adaptive codebook	Manual	Built-in (auto-rebuild)	No
Codebook persistence	Manual	Built-in (`codebook_path`)	No
Streaming	No	`encode_stream`	WebSocket
Distributed processing	No	Opt-in (`workers=[...]`)	No
Oracle explanations	No	No	Yes
Validation	No	No	Yes
Statistics/metrics	No	`get_stats()`	Yes
Best for	Simple encode/decode	Production pipelines, high throughput	Full-featured service, debugging, monitoring

Binary Format

All payloads start with magic bytes \xE7\xB0 followed by a version byte:

v1/v2: MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + PAYLOAD + CRC32(4)
v3:    MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + CB_VERSION(2) + CB_LEN(2) + [CODEBOOK] + PAYLOAD + CRC32(4)

Common keys (role, content, action, status, priority, ...) and values (user, assistant, system, high, low, ...) are encoded as single-byte tokens. Remaining data is zlib-compressed.

Adaptive Codebooks

The static dictionaries cover common AI communication patterns. For domain-specific traffic, train a codebook that learns your most frequent keys and values:

from emergent_translator import AdaptiveCodebook, BatchEncoder

# Train on observed traffic
codebook = AdaptiveCodebook()
for msg in training_messages:
    codebook.observe(msg)
codebook.rebuild(min_freq=5)

# Encode with learned codebook (v3 format, codebook embedded in payload)
encoder = BatchEncoder()
result = encoder.encode_batch(messages, codebook=codebook.active)
decoded = encoder.decode_batch(result.payload)  # codebook auto-extracted

Train a codebook from synthetic data:

python scripts/train_codebook.py --messages 50000 --benchmark

Multi-Format Support

Parse and serialize 13+ formats, then compress through the batch encoder:

from emergent_translator import detect_format, get_handler, BatchEncoder

fmt = detect_format("data.csv")          # "csv"
parse_fn, serialize_fn = get_handler(fmt)
records = parse_fn(open("data.csv").read())

encoder = BatchEncoder()
result = encoder.encode_batch(records)

Supported: JSON, CSV, JSONL, YAML, TOML, INI, XML, MessagePack, Protobuf, Parquet, Arrow, BSON, CBOR, XLSX.

GPU Batch Encoder

For higher throughput, use the GPU-accelerated encoder (falls back to CPU with ThreadPoolExecutor if CuPy is unavailable):

from emergent_translator import GPUBatchEncoder

gpu_encoder = GPUBatchEncoder(num_workers=8)
result = gpu_encoder.encode_batch(messages)
decoded = gpu_encoder.decode_batch(result.payload)

LLM Token Savings

Two complementary modules for reducing token usage with LLMs like Claude:

Code Skeletonization

Strip Python files to signatures + docstrings. Feed Claude the structure without paying for implementation lines:

from emergent_translator import skeletonize_file

result = skeletonize_file("my_module.py", focal=["important_func"])
print(f"{result.original_tokens} -> {result.skeleton_tokens} tokens "
      f"({result.token_reduction_pct:.0f}% reduction)")

Claude Text Compression

Compress keys and values in structured data flowing through Claude API conversations:

from emergent_translator import ClaudeCompressor

compressor = ClaudeCompressor()
system = compressor.system_prompt_prefix() + "\n\nYour prompt..."
compressed_msgs = compressor.compress_messages(messages)

Self-Hosting the API

Run your own instance instead of using the public API:

# From source
pip install -r requirements.txt
uvicorn src.emergent_translator.api_server:app --port 8000

# Docker
docker build -t emergent-translator .
docker run -p 8000:8000 emergent-translator

Then point the SDK or CLI at it:

sdk = TranslatorSDK(api_url="http://localhost:8000")

emergent-translator --api-url http://localhost:8000 health

API Endpoints

Endpoint	Purpose
`POST /translate`	Compress JSON/text to emergent symbols
`POST /translate/batch`	Batch compression
`POST /oracle/explain`	Human-readable explanation of compressed data
`POST /oracle/validate`	Validate translation accuracy
`WebSocket /ws/translate`	Real-time streaming compression
`GET /health`	Health check
`GET /metrics`	Prometheus metrics

Environment Variables

Variable	Default	Purpose
`API_AUTH_TOKEN`	`eudaimonia-translator-demo`	Bearer token for authentication
`CACHE_MAX_SIZE`	`10000`	Translation cache entries
`CACHE_TTL_SECONDS`	`300`	Cache TTL in seconds
`ENVIRONMENT`	`development`	Set to `production` for structured logging
`PIPELINE_CODEBOOK_PATH`	(none)	File path for adaptive codebook persistence in `/translate/batch` pipeline. When set, enables adaptive codebook and saves/loads learned patterns across restarts.

Project Structure

src/emergent_translator/    # pip-installable package
  batch_encoder.py          # v1 batch encoder (encode/decode)
  gpu_batch_encoder.py      # v2 GPU-accelerated encoder
  adaptive_codebook.py      # v3 learned codebooks
  format_handlers.py        # 13+ format parsers
  emergent_symbols.py       # symbol encoder
  local_pipeline.py         # LocalPipeline / LocalPipelineSync
  lazy_collector.py         # LocalLazyCollector (auto-batching)
  api_server.py             # FastAPI server
  client_sdk.py             # Python SDK (sync + async clients)
  cli.py                    # CLI tool
scripts/                    # benchmarks, stress tests, workers
tests/                      # 535 tests

Development

git clone https://github.com/maco144/emergent-language
cd emergent-language
pip install -e ".[all,dev]"
python -m pytest tests/ -v

License

GPL-3.0-or-later. See LICENSE for details.

Commercial licensing available — see COMMERCIAL_LICENSE.md.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.3.0

Feb 6, 2026

1.1.1

Feb 6, 2026

1.1.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emergent_translator-1.3.0.tar.gz (193.9 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

emergent_translator-1.3.0-py3-none-any.whl (178.1 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file emergent_translator-1.3.0.tar.gz.

File metadata

Download URL: emergent_translator-1.3.0.tar.gz
Upload date: Feb 6, 2026
Size: 193.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for emergent_translator-1.3.0.tar.gz
Algorithm	Hash digest
SHA256	`f5717b8989f5ddb14be900fcc85fdf0607d11ac47410972004bd715bd89edcba`
MD5	`9f7712ce7cd23b3a5cec975b2735f76d`
BLAKE2b-256	`2b888df09df6e509d5b2af42608b1c323c1164e9643fd75a8a4858628b7e5165`

See more details on using hashes here.

File details

Details for the file emergent_translator-1.3.0-py3-none-any.whl.

File metadata

Download URL: emergent_translator-1.3.0-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 178.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for emergent_translator-1.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e4efca41fbee06e16d912635253502dd1d7543a2a3374e09d8a8de8326a57f3`
MD5	`7e325a122b008d5d642015cfd9c69ad6`
BLAKE2b-256	`2706266374566823849641c417373e36385839c34ee96b75833707fed2d8eacb`

See more details on using hashes here.

emergent-translator 1.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Emergent Language Translator

Quick Start

For Integration Authors

Local Encoding (no server required)

Pipeline Mode (auto-batching)

API Mode (hosted service)

Compression Results

Encoding and Decoding

CLI

Python SDK

TranslatorSDK (sync)

EmergentTranslatorClient (async)

Convenience Functions

Local vs API — When to Use Which

Binary Format

Adaptive Codebooks

Multi-Format Support

GPU Batch Encoder

LLM Token Savings

Code Skeletonization

Claude Text Compression

Self-Hosting the API

API Endpoints

Environment Variables

Project Structure

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes