60x compression efficiency for AI communication through emergent language translation

These details have not been verified by PyPI

Project links

Project description

Emergent Language Translator

High-performance binary encoding for AI agent communication

Emergent Language Translator compresses structured AI messages into a compact binary format using learned codebooks, common-key dictionaries, and zlib. Batch encoding amortizes header overhead across messages — the more you batch, the better the ratio.

Quick Start

pip install emergent-translator

from emergent_translator import BatchEncoder

encoder = BatchEncoder()

# Encode a batch of agent messages
messages = [
    {"role": "user", "content": "analyze market trends", "priority": "high"},
    {"role": "assistant", "content": "Starting analysis", "status": "active"},
    {"role": "system", "content": "Agent coordinator online", "version": "1.0"},
]

result = encoder.encode_batch(messages)
print(f"{len(messages)} messages: 226 bytes JSON -> {len(result.payload)} bytes binary")
# 3 messages: 226 bytes JSON -> 141 bytes (38% reduction)

# Perfect round-trip reconstruction
decoded = encoder.decode_batch(result.payload)
assert decoded == messages

Compression Results

The batch encoder uses a binary wire format with common-key/value dictionaries and zlib compression. Efficiency improves with batch size:

Workload	JSON Size	Encoded Size	Reduction
3 agent messages	226 bytes	141 bytes	38%
10 agent messages	750 bytes	112 bytes	85%
50 agent messages	4,880 bytes	286 bytes	94%

Encoding speed: sub-millisecond (0.2ms typical).

Binary Format

All payloads start with magic bytes \xE7\xB0 followed by a version byte:

v1/v2: MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + PAYLOAD + CRC32(4)
v3:    MAGIC(2) + VERSION(1) + COUNT(2) + FLAGS(1) + CB_VERSION(2) + CB_LEN(2) + [CODEBOOK] + PAYLOAD + CRC32(4)

Common keys (role, content, action, status, priority, ...) and values (user, assistant, system, high, low, ...) are encoded as single-byte tokens. Remaining data is zlib-compressed.

Adaptive Codebooks

The static dictionaries cover common AI communication patterns. For domain-specific traffic, train a codebook that learns your most frequent keys and values:

from emergent_translator import AdaptiveCodebook, BatchEncoder

# Train on observed traffic
codebook = AdaptiveCodebook()
for msg in training_messages:
    codebook.observe(msg)
codebook.rebuild(min_freq=5)

# Encode with learned codebook (v3 format, codebook embedded in payload)
encoder = BatchEncoder()
result = encoder.encode_batch(messages, codebook=codebook.active)
decoded = encoder.decode_batch(result.payload)  # codebook auto-extracted

Train a codebook from synthetic data:

python scripts/train_codebook.py --messages 50000 --benchmark

Multi-Format Support

Parse and serialize 13+ formats, then compress through the batch encoder:

from emergent_translator import detect_format, get_handler, BatchEncoder

fmt = detect_format("data.csv")          # "csv"
parse_fn, serialize_fn = get_handler(fmt)
records = parse_fn(open("data.csv").read())

encoder = BatchEncoder()
result = encoder.encode_batch(records)

Supported: JSON, CSV, JSONL, YAML, TOML, INI, XML, MessagePack, Protobuf, Parquet, Arrow, BSON, CBOR, XLSX.

GPU Batch Encoder

For higher throughput, use the GPU-accelerated encoder (falls back to CPU with ThreadPoolExecutor if CuPy is unavailable):

from emergent_translator import GPUBatchEncoder

gpu_encoder = GPUBatchEncoder(num_workers=8)
result = gpu_encoder.encode_batch(messages)
decoded = gpu_encoder.decode_batch(result.payload)

LLM Token Savings

Two complementary modules for reducing token usage with LLMs like Claude:

Code Skeletonization

Strip Python files to signatures + docstrings. Feed Claude the structure without paying for implementation lines:

from emergent_translator import skeletonize_file

result = skeletonize_file("my_module.py", focal=["important_func"])
print(f"{result.original_tokens} -> {result.skeleton_tokens} tokens "
      f"({result.token_reduction_pct:.0f}% reduction)")

Claude Text Compression

Compress keys and values in structured data flowing through Claude API conversations:

from emergent_translator import ClaudeCompressor

compressor = ClaudeCompressor()
system = compressor.system_prompt_prefix() + "\n\nYour prompt..."
compressed_msgs = compressor.compress_messages(messages)

Project Structure

src/emergent_translator/    # pip-installable package
  batch_encoder.py          # v1 batch encoder (encode/decode)
  gpu_batch_encoder.py      # v2 GPU-accelerated encoder
  adaptive_codebook.py      # v3 learned codebooks
  format_handlers.py        # 13+ format parsers
  emergent_symbols.py       # symbol encoder
  api_server.py             # FastAPI server
  cli.py                    # CLI tool
scripts/                    # benchmarks, stress tests, workers
tests/                      # 535 tests

Development

git clone https://github.com/maco144/emergent-language
cd emergent-language
pip install -e ".[dev,formats]"
python -m pytest tests/ -v

Docker

docker build -t emergent-translator .
docker run -p 8000:8000 emergent-translator

License

GPL-3.0-or-later. See LICENSE for details.

Commercial licensing available — see COMMERCIAL_LICENSE.md.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.0

Feb 6, 2026

This version

1.1.1

Feb 6, 2026

1.1.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emergent_translator-1.1.1.tar.gz (152.2 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

emergent_translator-1.1.1-py3-none-any.whl (127.8 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file emergent_translator-1.1.1.tar.gz.

File metadata

Download URL: emergent_translator-1.1.1.tar.gz
Upload date: Feb 6, 2026
Size: 152.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for emergent_translator-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f48f00924b0d6cddebfb6af696ebeadf14577668f5f71c98c004e7433cf0b033`
MD5	`2289667388f34420798a53a8cccce8d4`
BLAKE2b-256	`508ddeef2e6012fbdecacc64818ce17346e333f29dbee60740bda283bf4dd715`

See more details on using hashes here.

File details

Details for the file emergent_translator-1.1.1-py3-none-any.whl.

File metadata

Download URL: emergent_translator-1.1.1-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 127.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for emergent_translator-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`32617045b47d2e0bbce112d7bce2b661302c359dc28ad4a306b59f8466feef62`
MD5	`c7b2711f446ea940676537a85a600687`
BLAKE2b-256	`647694a0c95341d93211f6eb97feb9d5af724c5dc2a9e84e89145997c81c5c19`

See more details on using hashes here.

emergent-translator 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Emergent Language Translator

Quick Start

Compression Results

Binary Format

Adaptive Codebooks

Multi-Format Support

GPU Batch Encoder

LLM Token Savings

Code Skeletonization

Claude Text Compression

Project Structure

Development

Docker

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes