Tokenizer + detokenizer for the Codec binary transport protocol — decode streaming token IDs from vLLM/SGLang, encode text into IDs for the bidirectional path. The Python twin of @codecai/web and Codec.Net.

These details have not been verified by PyPI

Project links

Project description

codecai

Python client for the Codec binary transport protocol.

The Python twin of @codecai/web (browser/Node) and Codec.Net (.NET). Decodes streaming token IDs from Codec-compliant servers (vLLM, SGLang) and encodes text into IDs for the bidirectional path. Pure Python, no native dependencies beyond msgspec and httpx.

Why this exists

Real measurements from Codec/packages/bench (live Ollama qwen2.5):

Configuration	B/token	vs JSON-SSE
JSON-SSE (live Ollama)	186.4	1.0×
Codec msgpack	16.0	9.6×
Codec protobuf	10.9	14.2×
Codec msgpack + `Content-Encoding: br`	2.79	55.2×

Agent-to-agent handoffs: 3.6× faster end-to-end at 1024 tokens, because both the wire shrinks and detokenize+tokenize gets eliminated.

Install

pip install codecai

Requires Python 3.9+.

Quick start — decode a stream

import asyncio
import httpx

from codecai import Detokenizer, decode_msgpack_stream, load_map


async def main() -> None:
    # 1. Load and pin the dialect map by sha256.
    m = await load_map(
        url="https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
        hash="sha256:c73972f7a580…",
    )

    # 2. Stream from a Codec-compliant server.
    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            "http://localhost:8000/v1/completions",
            json={
                "model": "Qwen/Qwen2.5-7B-Instruct",
                "prompt": "Explain entropy.",
                "stream_format": "msgpack",
                "max_tokens": 256,
            },
            timeout=None,
        ) as resp:
            # 3. Detokenize lazily — only when rendering for a human.
            detok = Detokenizer(m)
            async for frame in decode_msgpack_stream(resp.aiter_raw()):
                print(
                    detok.render(frame.ids, partial=not frame.done),
                    end="",
                    flush=True,
                )


asyncio.run(main())

Quick start — encode text (bidirectional path)

When you want zero text on the wire in either direction — agent A's output IDs feeding straight into agent B's input — encode text to IDs locally before sending:

from codecai import BPETokenizer

tok = BPETokenizer(m)
prompt_ids = tok.encode("Explain entropy.")  # pure-Python BPE, exact

# Send IDs as a normal OpenAI prompt: list[int] (no special endpoint needed).
async with httpx.AsyncClient() as client:
    async with client.stream(
        "POST",
        "http://localhost:8000/v1/completions",
        json={
            "prompt": prompt_ids,
            "stream_format": "msgpack",
            "max_tokens": 256,
        },
    ) as resp:
        ...

For huge prompts (>50K tokens, e.g. RAG with long context), /v1/completions/codec accepts a binary msgpack request body with the same effect. See PROTOCOL.md for both paths.

API

Symbol	Purpose
`load_map(url=..., hash=...)`	Fetch + sha256-verify + cache a dialect map (async)
`MemoryMapCache`	Default in-memory `MapCache`. Subclass for Redis / disk
`TokenizerMap.from_json(...)`	Parse + schema check
`Detokenizer`	Stateful detokenizer: byte_level + metaspace + byte fallback + partial UTF-8
`detokenize(map, ids)`	One-shot for non-streaming use
`BPETokenizer`	Pure-Python BPE: byte_level + metaspace
`LongestMatchTokenizer`	Vocab-only fallback for canonical-IR maps
`pick_tokenizer(map)`	Build the right tokenizer for the loaded map
`tokenize(map, text)`	One-shot helper
`decode_msgpack_stream(body)`	`AsyncIterable[bytes]` → `AsyncIterator[CodecFrame]`
`decode_protobuf_stream(body)`	Same for length-prefixed protobuf
`decode_protobuf_frame(payload)`	One-shot frame decoder (no length prefix)

Correctness

Byte-level decode: every vocab token is a sequence of GPT-2-encoded bytes. The Detokenizer reverses the byte→unicode table and accumulates bytes across tokens until a complete UTF-8 sequence forms. Tested with 3-byte (€) and 4-byte (🚀) sequences.
Metaspace decode: ▁ becomes space; SentencePiece byte-fallback IDs (<0x00>–<0xFF>) decoded through the same UTF-8 buffer.
Partial sequences across frames: Detokenizer is stateful — call render(ids, partial=True) while frames stream, then partial=False (default) on the last frame so the buffer flushes. reset() between conversations.
BPE merge ordering: greedy by priority, not left-to-right. Matches HuggingFace tokenizers reference behavior. Test fixture verifies this explicitly.
HuggingFace round-trip: real Qwen-2 (152K vocab, byte_level) round-trips ASCII, code, emoji, multi-script CJK / Latin diacritics. Bit-identical with HF's Rust tokenizers library (verified by tests/test_bpe.py::test_qwen_matches_hf_reference).
Hash verification uses hashlib.sha256. Mismatch raises TokenizerMapHashMismatchError.

Map sources

load_map accepts any URL — the sha256 hash is what matters. Curated maps:

https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/<family>.json

14 families covering 70+ aliases — see codec-maps for the index.

To generate a map from a HuggingFace tokenizer.json:

npx @codecai/maps-cli build my-org/my-model --id=my-org/my-model

Compression

load_map uses httpx, which transparently decompresses gzip and brotli Content-Encoding. jsDelivr serves brotli automatically (3.4× smaller transfers). For Codec streaming responses, the server negotiates Content-Encoding based on the request's Accept-Encoding.

License

MIT. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codecai-0.1.0.tar.gz (18.4 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

codecai-0.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file codecai-0.1.0.tar.gz.

File metadata

Download URL: codecai-0.1.0.tar.gz
Upload date: May 6, 2026
Size: 18.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for codecai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2ba51ed5c4d0fbeffdd79944615c990095739e33437ff6c44017ce1d18c14e94`
MD5	`a7fd5e90420c3ef124777cb02d548f13`
BLAKE2b-256	`c447355cac34f968fbd63e9f7aab9e28e331408a21e5be9e341bbb3b8ee348e3`

See more details on using hashes here.

File details

Details for the file codecai-0.1.0-py3-none-any.whl.

File metadata

Download URL: codecai-0.1.0-py3-none-any.whl
Upload date: May 6, 2026
Size: 18.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for codecai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37638339413742835138566d22133b87194029713db8e9ff72e674bd5ad0f268`
MD5	`34736979fe403a340f5dae8903069035`
BLAKE2b-256	`18c43e2a715ee67e381f112f1dedf83bcbe0dfb2df9e5e5b964b7a3790edf0ac`

See more details on using hashes here.

codecai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

codecai

Why this exists

Install

Quick start — decode a stream

Quick start — encode text (bidirectional path)

API

Correctness

Map sources

Compression

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes