Tokenizer + detokenizer for the Codec binary transport protocol — decode streaming token IDs from vLLM/SGLang, encode text into IDs for the bidirectional path. The Python twin of @codecai/web and Codec.Net.
Project description
codecai
Python client for the Codec binary transport protocol.
The Python twin of @codecai/web (browser/Node) and Codec.Net (.NET). Decodes streaming token IDs from Codec-compliant servers (vLLM, SGLang) and encodes text into IDs for the bidirectional path. Pure Python, no native dependencies beyond msgspec and httpx.
Why this exists
Real measurements from Codec/packages/bench (live Ollama qwen2.5):
| Configuration | B/token | vs JSON-SSE |
|---|---|---|
| JSON-SSE (live Ollama) | 186.4 | 1.0× |
| Codec msgpack | 16.0 | 9.6× |
| Codec protobuf | 10.9 | 14.2× |
Codec msgpack + Content-Encoding: br |
2.79 | 55.2× |
Agent-to-agent handoffs: 3.6× faster end-to-end at 1024 tokens, because both the wire shrinks and detokenize+tokenize gets eliminated.
Install
pip install codecai
Requires Python 3.9+.
Quick start — decode a stream
import asyncio
import httpx
from codecai import Detokenizer, decode_msgpack_stream, load_map
async def main() -> None:
# 1. Load and pin the dialect map by sha256.
m = await load_map(
url="https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/qwen/qwen2.json",
hash="sha256:c73972f7a580…",
)
# 2. Stream from a Codec-compliant server.
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
"http://localhost:8000/v1/completions",
json={
"model": "Qwen/Qwen2.5-7B-Instruct",
"prompt": "Explain entropy.",
"stream_format": "msgpack",
"max_tokens": 256,
},
timeout=None,
) as resp:
# 3. Detokenize lazily — only when rendering for a human.
detok = Detokenizer(m)
async for frame in decode_msgpack_stream(resp.aiter_raw()):
print(
detok.render(frame.ids, partial=not frame.done),
end="",
flush=True,
)
asyncio.run(main())
Quick start — encode text (bidirectional path)
When you want zero text on the wire in either direction — agent A's output IDs feeding straight into agent B's input — encode text to IDs locally before sending:
from codecai import BPETokenizer
tok = BPETokenizer(m)
prompt_ids = tok.encode("Explain entropy.") # pure-Python BPE, exact
# Send IDs as a normal OpenAI prompt: list[int] (no special endpoint needed).
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
"http://localhost:8000/v1/completions",
json={
"prompt": prompt_ids,
"stream_format": "msgpack",
"max_tokens": 256,
},
) as resp:
...
For huge prompts (>50K tokens, e.g. RAG with long context), /v1/completions/codec accepts a binary msgpack request body with the same effect. See PROTOCOL.md for both paths.
API
| Symbol | Purpose |
|---|---|
load_map(url=..., hash=...) |
Fetch + sha256-verify + cache a dialect map (async) |
discover_map(origin=..., id=...) |
Resolve a map via the .well-known/codec/ convention (async) |
discover_index(origin=...) |
Fetch .well-known/codec/index.json (optional directory, async) |
MemoryMapCache |
Default in-memory MapCache. Subclass for Redis / disk |
TokenizerMap.from_json(...) |
Parse + schema check |
Detokenizer |
Stateful detokenizer: byte_level + metaspace + byte fallback + partial UTF-8 |
detokenize(map, ids) |
One-shot for non-streaming use |
BPETokenizer |
Pure-Python BPE: byte_level + metaspace |
LongestMatchTokenizer |
Vocab-only fallback for canonical-IR maps |
pick_tokenizer(map) |
Build the right tokenizer for the loaded map |
tokenize(map, text) |
One-shot helper |
decode_msgpack_stream(body) |
AsyncIterable[bytes] → AsyncIterator[CodecFrame] |
decode_protobuf_stream(body) |
Same for length-prefixed protobuf |
decode_protobuf_frame(payload) |
One-shot frame decoder (no length prefix) |
ToolWatcher |
Detect delimited regions (tool calls, reasoning blocks, vision spans) without decoding |
Translator, translate(...), static_translation_table(...) |
Cross-vocab agent handoff: ids_A → text → ids_B with streaming-safe word-boundary buffering |
SafetyPolicyDescriptor, validate_safety_policy, hash_safety_policy (v0.4) |
Sanitized publishable safety-policy descriptor — load and verify a server's advertised policy without operator-internal fields ever crossing the wire |
load_safety_policy(url=..., hash=...), discover_safety_policy(origin=..., id=..., hash=...) (v0.4) |
Async loader + .well-known/codec/policies/ discovery; cross-stack canonical-bytes hash matches the TS / Rust / .NET / Java / supervisor implementations bit-for-bit |
Detect tool calls without decoding
Most chat-tuned models delimit tool calls with single-token specials (Qwen <tool_call> / </tool_call>, Llama 3.1+ <|python_tag|> / <|eom_id|>, DeepSeek-R1 <think> / </think>, etc.). Detecting one is a uint32 compare in the hot loop — no detokenize, no string allocation:
from codecai import ToolWatcher
watcher = ToolWatcher(map, "<tool_call>", "</tool_call>")
async for frame in decode_msgpack_stream(resp.aiter_raw()):
for ev in watcher.feed(frame.ids):
if ev.kind == "passthrough":
forward_codec_frame(next_agent, ev.ids) # no decode
else: # "region"
json_args = json.loads(detok.render(ev.ids))
dispatch_tool(json_args)
Stateful — regions split between network frames buffer until the end marker arrives. The same primitive covers reasoning blocks, multimodal spans, code-interpreter regions — anything delimited by a (start, end) special pair.
Cross-vocab agent handoff
When agent A's output feeds agent B as a prompt and the two models have different vocabs, decode-then-reencode through text — without ever putting text on the wire:
from codecai import Translator
tr = Translator(qwen_map, llama_map)
async for frame in decode_msgpack_stream(resp.aiter_raw()):
llama_ids = tr.translate(frame.ids, partial=not frame.done)
forward_codec_frame(llama_agent, llama_ids)
# tr.finish() drains the trailing partial-word buffer.
Translator is stateful: pre-tokenizers split at whitespace, so it buffers partial words until a safe boundary arrives. For analysis-only use, static_translation_table(A, B) gives a context-free id_A → ids_B lookup.
Correctness
- Byte-level decode: every vocab token is a sequence of GPT-2-encoded bytes. The Detokenizer reverses the byte→unicode table and accumulates bytes across tokens until a complete UTF-8 sequence forms. Tested with 3-byte (
€) and 4-byte (🚀) sequences. - Metaspace decode:
▁becomes space; SentencePiece byte-fallback IDs (<0x00>–<0xFF>) decoded through the same UTF-8 buffer. - Partial sequences across frames:
Detokenizeris stateful — callrender(ids, partial=True)while frames stream, thenpartial=False(default) on the last frame so the buffer flushes.reset()between conversations. - BPE merge ordering: greedy by priority, not left-to-right. Matches HuggingFace
tokenizersreference behavior. Test fixture verifies this explicitly. - HuggingFace round-trip: real Qwen-2 (152K vocab, byte_level) round-trips ASCII, code, emoji, multi-script CJK / Latin diacritics. Bit-identical with HF's Rust
tokenizerslibrary (verified bytests/test_bpe.py::test_qwen_matches_hf_reference). - Hash verification uses
hashlib.sha256. Mismatch raisesTokenizerMapHashMismatchError.
Map sources
load_map accepts any URL — the sha256 hash is what matters. Curated maps:
https://cdn.jsdelivr.net/gh/wdunn001/codec-maps/maps/<family>.json
14 families covering 70+ aliases — see codec-maps for the index.
To generate a map from a HuggingFace tokenizer.json:
npx @codecai/maps-cli build my-org/my-model --id=my-org/my-model
Self-hosted discovery via .well-known/codec/
If the model maintainer publishes their map at the standard .well-known/codec/ location on a domain they control, clients only need the origin and map ID:
from codecai import discover_map
m = await discover_map(origin="https://qwen.io", id="qwen/qwen2")
This fetches https://qwen.io/.well-known/codec/maps/qwen/qwen2.json — either a small { id, url, hash } pointer (recommended) or the full map served inline. Hash verification still anchors the bytes. Spec: WELL_KNOWN_DISCOVERY.md. Maintainers can generate the publishing tree with codecai-maps well-known --map=... --url=....
Compression
load_map uses httpx, which transparently decompresses gzip and brotli Content-Encoding. jsDelivr serves brotli automatically (3.4× smaller transfers). For Codec streaming responses, the server negotiates Content-Encoding based on the request's Accept-Encoding.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codecai-0.4.1.tar.gz.
File metadata
- Download URL: codecai-0.4.1.tar.gz
- Upload date:
- Size: 53.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a63cdee1117cbdb5792b66f6ee43d5de763002a3e74cbeaf9d78171b57d2377
|
|
| MD5 |
d8dd6bf33ee9ecb631270cb44ff942f0
|
|
| BLAKE2b-256 |
57194a998769d7e4e0199a7ac4d298e69771fdb1801dfc0db2324118778db935
|
Provenance
The following attestation bundles were made for codecai-0.4.1.tar.gz:
Publisher:
publish-pypi.yml on wdunn001/Codec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codecai-0.4.1.tar.gz -
Subject digest:
8a63cdee1117cbdb5792b66f6ee43d5de763002a3e74cbeaf9d78171b57d2377 - Sigstore transparency entry: 1554231567
- Sigstore integration time:
-
Permalink:
wdunn001/Codec@b86abd82c71657495cf625fc646d849d39441029 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/wdunn001
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@b86abd82c71657495cf625fc646d849d39441029 -
Trigger Event:
push
-
Statement type:
File details
Details for the file codecai-0.4.1-py3-none-any.whl.
File metadata
- Download URL: codecai-0.4.1-py3-none-any.whl
- Upload date:
- Size: 48.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5156e215af3c012b8f5516040db1925459182cb326aa9a8ce21266d5dd205bd3
|
|
| MD5 |
c00b7fe1166d8c403d46e15adb719d89
|
|
| BLAKE2b-256 |
86217fc544a4c6ff13956b54d7eeddfd6a44c65f11c0cd8908ee52d19af1aaf7
|
Provenance
The following attestation bundles were made for codecai-0.4.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on wdunn001/Codec
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codecai-0.4.1-py3-none-any.whl -
Subject digest:
5156e215af3c012b8f5516040db1925459182cb326aa9a8ce21266d5dd205bd3 - Sigstore transparency entry: 1554231698
- Sigstore integration time:
-
Permalink:
wdunn001/Codec@b86abd82c71657495cf625fc646d849d39441029 -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/wdunn001
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@b86abd82c71657495cf625fc646d849d39441029 -
Trigger Event:
push
-
Statement type: