Skip to main content

Tokenizer + detokenizer for the Codec binary transport protocol — decode streaming token IDs from vLLM/SGLang, encode text into IDs for the bidirectional path. The Python twin of @codecai/web and Codec.Net.

Project description

codecai

Python client for the Codec binary transport protocol.

The Python twin of @codecai/web (browser/Node) and Codec.Net (.NET). Decodes streaming token IDs from Codec-compliant servers (vLLM, SGLang) and encodes text into IDs for the bidirectional path. Pure Python, no native dependencies beyond msgspec and httpx.

Why this exists

Real measurements from Codec/packages/bench (live Ollama qwen2.5):

Configuration B/token vs JSON-SSE
JSON-SSE (live Ollama) 186.4 1.0×
Codec msgpack 16.0 9.6×
Codec protobuf 10.9 14.2×
Codec msgpack + Content-Encoding: br 2.79 55.2×

Agent-to-agent handoffs: 3.6× faster end-to-end at 1024 tokens, because both the wire shrinks and detokenize+tokenize gets eliminated.

Install

pip install codecai

Requires Python 3.9+.

Quick start — decode a stream

import asyncio
import httpx

from codecai import Detokenizer, decode_msgpack_stream, load_map


async def main() -> None:
    # 1. Load and pin the dialect map by sha256.
    m = await load_map(
        url="https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
        hash="sha256:c73972f7a580…",
    )

    # 2. Stream from a Codec-compliant server.
    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            "http://localhost:8000/v1/completions",
            json={
                "model": "Qwen/Qwen2.5-7B-Instruct",
                "prompt": "Explain entropy.",
                "stream_format": "msgpack",
                "max_tokens": 256,
            },
            timeout=None,
        ) as resp:
            # 3. Detokenize lazily — only when rendering for a human.
            detok = Detokenizer(m)
            async for frame in decode_msgpack_stream(resp.aiter_raw()):
                print(
                    detok.render(frame.ids, partial=not frame.done),
                    end="",
                    flush=True,
                )


asyncio.run(main())

Quick start — encode text (bidirectional path)

When you want zero text on the wire in either direction — agent A's output IDs feeding straight into agent B's input — encode text to IDs locally before sending:

from codecai import BPETokenizer

tok = BPETokenizer(m)
prompt_ids = tok.encode("Explain entropy.")  # pure-Python BPE, exact

# Send IDs as a normal OpenAI prompt: list[int] (no special endpoint needed).
async with httpx.AsyncClient() as client:
    async with client.stream(
        "POST",
        "http://localhost:8000/v1/completions",
        json={
            "prompt": prompt_ids,
            "stream_format": "msgpack",
            "max_tokens": 256,
        },
    ) as resp:
        ...

For huge prompts (>50K tokens, e.g. RAG with long context), /v1/completions/codec accepts a binary msgpack request body with the same effect. See PROTOCOL.md for both paths.

API

Symbol Purpose
load_map(url=..., hash=...) Fetch + sha256-verify + cache a dialect map (async)
MemoryMapCache Default in-memory MapCache. Subclass for Redis / disk
TokenizerMap.from_json(...) Parse + schema check
Detokenizer Stateful detokenizer: byte_level + metaspace + byte fallback + partial UTF-8
detokenize(map, ids) One-shot for non-streaming use
BPETokenizer Pure-Python BPE: byte_level + metaspace
LongestMatchTokenizer Vocab-only fallback for canonical-IR maps
pick_tokenizer(map) Build the right tokenizer for the loaded map
tokenize(map, text) One-shot helper
decode_msgpack_stream(body) AsyncIterable[bytes]AsyncIterator[CodecFrame]
decode_protobuf_stream(body) Same for length-prefixed protobuf
decode_protobuf_frame(payload) One-shot frame decoder (no length prefix)

Correctness

  • Byte-level decode: every vocab token is a sequence of GPT-2-encoded bytes. The Detokenizer reverses the byte→unicode table and accumulates bytes across tokens until a complete UTF-8 sequence forms. Tested with 3-byte () and 4-byte (🚀) sequences.
  • Metaspace decode: becomes space; SentencePiece byte-fallback IDs (<0x00><0xFF>) decoded through the same UTF-8 buffer.
  • Partial sequences across frames: Detokenizer is stateful — call render(ids, partial=True) while frames stream, then partial=False (default) on the last frame so the buffer flushes. reset() between conversations.
  • BPE merge ordering: greedy by priority, not left-to-right. Matches HuggingFace tokenizers reference behavior. Test fixture verifies this explicitly.
  • HuggingFace round-trip: real Qwen-2 (152K vocab, byte_level) round-trips ASCII, code, emoji, multi-script CJK / Latin diacritics. Bit-identical with HF's Rust tokenizers library (verified by tests/test_bpe.py::test_qwen_matches_hf_reference).
  • Hash verification uses hashlib.sha256. Mismatch raises TokenizerMapHashMismatchError.

Map sources

load_map accepts any URL — the sha256 hash is what matters. Curated maps:

https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/<family>.json

14 families covering 70+ aliases — see codec-maps for the index.

To generate a map from a HuggingFace tokenizer.json:

npx @codecai/maps-cli build my-org/my-model --id=my-org/my-model

Compression

load_map uses httpx, which transparently decompresses gzip and brotli Content-Encoding. jsDelivr serves brotli automatically (3.4× smaller transfers). For Codec streaming responses, the server negotiates Content-Encoding based on the request's Accept-Encoding.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codecai-0.1.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codecai-0.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file codecai-0.1.0.tar.gz.

File metadata

  • Download URL: codecai-0.1.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for codecai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2ba51ed5c4d0fbeffdd79944615c990095739e33437ff6c44017ce1d18c14e94
MD5 a7fd5e90420c3ef124777cb02d548f13
BLAKE2b-256 c447355cac34f968fbd63e9f7aab9e28e331408a21e5be9e341bbb3b8ee348e3

See more details on using hashes here.

File details

Details for the file codecai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: codecai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for codecai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37638339413742835138566d22133b87194029713db8e9ff72e674bd5ad0f268
MD5 34736979fe403a340f5dae8903069035
BLAKE2b-256 18c43e2a715ee67e381f112f1dedf83bcbe0dfb2df9e5e5b964b7a3790edf0ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page