Tokenizer + detokenizer for the Codec binary transport protocol — decode streaming token IDs from vLLM/SGLang, encode text into IDs for the bidirectional path. The Python twin of @codecai/web and Codec.Net.
Project description
codecai
Python client for the Codec binary transport protocol.
The Python twin of @codecai/web (browser/Node) and Codec.Net (.NET). Decodes streaming token IDs from Codec-compliant servers (vLLM, SGLang) and encodes text into IDs for the bidirectional path. Pure Python, no native dependencies beyond msgspec and httpx.
Why this exists
Real measurements from Codec/packages/bench (live Ollama qwen2.5):
| Configuration | B/token | vs JSON-SSE |
|---|---|---|
| JSON-SSE (live Ollama) | 186.4 | 1.0× |
| Codec msgpack | 16.0 | 9.6× |
| Codec protobuf | 10.9 | 14.2× |
Codec msgpack + Content-Encoding: br |
2.79 | 55.2× |
Agent-to-agent handoffs: 3.6× faster end-to-end at 1024 tokens, because both the wire shrinks and detokenize+tokenize gets eliminated.
Install
pip install codecai
Requires Python 3.9+.
Quick start — decode a stream
import asyncio
import httpx
from codecai import Detokenizer, decode_msgpack_stream, load_map
async def main() -> None:
# 1. Load and pin the dialect map by sha256.
m = await load_map(
url="https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
hash="sha256:c73972f7a580…",
)
# 2. Stream from a Codec-compliant server.
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
"http://localhost:8000/v1/completions",
json={
"model": "Qwen/Qwen2.5-7B-Instruct",
"prompt": "Explain entropy.",
"stream_format": "msgpack",
"max_tokens": 256,
},
timeout=None,
) as resp:
# 3. Detokenize lazily — only when rendering for a human.
detok = Detokenizer(m)
async for frame in decode_msgpack_stream(resp.aiter_raw()):
print(
detok.render(frame.ids, partial=not frame.done),
end="",
flush=True,
)
asyncio.run(main())
Quick start — encode text (bidirectional path)
When you want zero text on the wire in either direction — agent A's output IDs feeding straight into agent B's input — encode text to IDs locally before sending:
from codecai import BPETokenizer
tok = BPETokenizer(m)
prompt_ids = tok.encode("Explain entropy.") # pure-Python BPE, exact
# Send IDs as a normal OpenAI prompt: list[int] (no special endpoint needed).
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
"http://localhost:8000/v1/completions",
json={
"prompt": prompt_ids,
"stream_format": "msgpack",
"max_tokens": 256,
},
) as resp:
...
For huge prompts (>50K tokens, e.g. RAG with long context), /v1/completions/codec accepts a binary msgpack request body with the same effect. See PROTOCOL.md for both paths.
API
| Symbol | Purpose |
|---|---|
load_map(url=..., hash=...) |
Fetch + sha256-verify + cache a dialect map (async) |
MemoryMapCache |
Default in-memory MapCache. Subclass for Redis / disk |
TokenizerMap.from_json(...) |
Parse + schema check |
Detokenizer |
Stateful detokenizer: byte_level + metaspace + byte fallback + partial UTF-8 |
detokenize(map, ids) |
One-shot for non-streaming use |
BPETokenizer |
Pure-Python BPE: byte_level + metaspace |
LongestMatchTokenizer |
Vocab-only fallback for canonical-IR maps |
pick_tokenizer(map) |
Build the right tokenizer for the loaded map |
tokenize(map, text) |
One-shot helper |
decode_msgpack_stream(body) |
AsyncIterable[bytes] → AsyncIterator[CodecFrame] |
decode_protobuf_stream(body) |
Same for length-prefixed protobuf |
decode_protobuf_frame(payload) |
One-shot frame decoder (no length prefix) |
Correctness
- Byte-level decode: every vocab token is a sequence of GPT-2-encoded bytes. The Detokenizer reverses the byte→unicode table and accumulates bytes across tokens until a complete UTF-8 sequence forms. Tested with 3-byte (
€) and 4-byte (🚀) sequences. - Metaspace decode:
▁becomes space; SentencePiece byte-fallback IDs (<0x00>–<0xFF>) decoded through the same UTF-8 buffer. - Partial sequences across frames:
Detokenizeris stateful — callrender(ids, partial=True)while frames stream, thenpartial=False(default) on the last frame so the buffer flushes.reset()between conversations. - BPE merge ordering: greedy by priority, not left-to-right. Matches HuggingFace
tokenizersreference behavior. Test fixture verifies this explicitly. - HuggingFace round-trip: real Qwen-2 (152K vocab, byte_level) round-trips ASCII, code, emoji, multi-script CJK / Latin diacritics. Bit-identical with HF's Rust
tokenizerslibrary (verified bytests/test_bpe.py::test_qwen_matches_hf_reference). - Hash verification uses
hashlib.sha256. Mismatch raisesTokenizerMapHashMismatchError.
Map sources
load_map accepts any URL — the sha256 hash is what matters. Curated maps:
https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/<family>.json
14 families covering 70+ aliases — see codec-maps for the index.
To generate a map from a HuggingFace tokenizer.json:
npx @codecai/maps-cli build my-org/my-model --id=my-org/my-model
Compression
load_map uses httpx, which transparently decompresses gzip and brotli Content-Encoding. jsDelivr serves brotli automatically (3.4× smaller transfers). For Codec streaming responses, the server negotiates Content-Encoding based on the request's Accept-Encoding.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codecai-0.1.0.tar.gz.
File metadata
- Download URL: codecai-0.1.0.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ba51ed5c4d0fbeffdd79944615c990095739e33437ff6c44017ce1d18c14e94
|
|
| MD5 |
a7fd5e90420c3ef124777cb02d548f13
|
|
| BLAKE2b-256 |
c447355cac34f968fbd63e9f7aab9e28e331408a21e5be9e341bbb3b8ee348e3
|
File details
Details for the file codecai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codecai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37638339413742835138566d22133b87194029713db8e9ff72e674bd5ad0f268
|
|
| MD5 |
34736979fe403a340f5dae8903069035
|
|
| BLAKE2b-256 |
18c43e2a715ee67e381f112f1dedf83bcbe0dfb2df9e5e5b964b7a3790edf0ac
|