vLLM plugin: out-of-tree registration of canon-layer architectures (e.g. LlamaCanonForCausalLM from PhysicsLM4)

These details have not been verified by PyPI

Project links

Project description

vllm-canon

An out-of-tree vLLM plugin that adds support for the LlamaCanonForCausalLM architecture — the "canon layer" variant of Llama introduced in Zeyuan Allen-Zhu's PhysicsLM4 / Canon Layers work.

A canon layer is a depthwise causal short convolution (kernel=4 by default) inserted at up to four positions in each decoder block:

canonA — on the residual stream after input_layernorm, before attention
canonB — on the fused qkv stream before RoPE
canonC — on the residual stream after post_attention_layernorm, before MLP
canonD — on the fused gate_up stream before silu * mul

Install

pip install vllm-canon

After install, vLLM auto-discovers the plugin via its vllm.general_plugins entry point.

Use

Pass trust_remote_code=True so HuggingFace autoloads the custom LlamaCanonConfig from your model directory:

from vllm import LLM, SamplingParams

llm = LLM(
    model="/path/to/your/canon-model",
    trust_remote_code=True,
    tensor_parallel_size=1,
    dtype="bfloat16",
    enforce_eager=True,
)
print(llm.generate(["hello"], SamplingParams(temperature=0, max_tokens=32))[0].outputs[0].text)

Or start a vLLM server:

vllm serve /path/to/your/canon-model \
  --trust-remote-code --tensor-parallel-size 1 --dtype bfloat16 \
  --enforce-eager --port 8000 --served-model-name canon

What the plugin does

Registers LlamaCanonForCausalLM in ModelRegistry via the vllm.general_plugins entry point — no edits to the vLLM source tree.
Rebuilds the Llama block with vLLM primitives (QKVParallelLinear, MergedColumnParallelLinear, paged attention, partial RoPE via partial_rotary_factor) and inserts the four canon convolutions at the HF reference positions.
Each canon conv is a MambaBase with mamba_type="short_conv" so vLLM's V1 engine allocates a per-request (kernel-1, dim) rolling state alongside the KV cache. The model declares HasInnerState and IsHybrid so the engine plumbs that state correctly.
The conv forward is written in pure PyTorch (F.conv1d for prefill, shift-append + dot for decode). The triton kernel (causal_conv1d_fn / causal_conv1d_update) produced state-update results that diverged from the reference in this setting; the canon width is tiny so the pure-torch path is fine.
The HF checkpoint loads via vLLM's standard stacked-weight mapping: q_proj/k_proj/v_proj → qkv_proj, gate_proj/up_proj → gate_up_proj, canon weights by name. lm_head is tied to embed_tokens when tie_word_embeddings=True.

Limitations

tensor_parallel_size=1 only. Canon B and canon D operate on fused QKV / gate_up streams; per-shard weight layouts under TP>1 need separate work.
rope_version='huggingface' only. Lingua-style interleaved RoPE is not supported.
enforce_eager=True recommended. The model class is not decorated with @support_torch_compile; adding it would require an explicit dynamic_arg_dims.

Compatibility

vLLM >=0.15,<0.17 (tested on 0.15.1)
transformers >= 4.57
PyTorch >= 2.5
Python >= 3.10

Parity

Verified against HuggingFace .generate() on the qwen1.5-0.5b-newtok-canon PhysicsLM4 checkpoint: 16/16 greedy tokens match for both a 1-token prompt (exercises the decode-path conv state update) and a 12-token prompt (exercises the prefill conv).

License

Apache 2.0. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_canon-0.1.0.tar.gz (21.1 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vllm_canon-0.1.0-py3-none-any.whl (19.5 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file vllm_canon-0.1.0.tar.gz.

File metadata

Download URL: vllm_canon-0.1.0.tar.gz
Upload date: Apr 15, 2026
Size: 21.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for vllm_canon-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`420386c7bfdea9e1d64d126258806300edad4c9cafe40df93da6a938aed809e3`
MD5	`f7d0bdf4f5de93d8c5194aef008cd1dc`
BLAKE2b-256	`f2d88cef6e98fa6068310f6bb9d20982ceb558b9e966534ddf4ac475ad8a7a1d`

See more details on using hashes here.

File details

Details for the file vllm_canon-0.1.0-py3-none-any.whl.

File metadata

Download URL: vllm_canon-0.1.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 19.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for vllm_canon-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`475ae62e9d95128dfe5716bc1a4ff29125122eb21f1e9f9917775a8b5708c0f9`
MD5	`54122ae2d6bc1ba3284dbec395be4f5d`
BLAKE2b-256	`c9d7fba9e3458c165ef7f7c2e10099add02486a00ace08eb0024178499773086`

See more details on using hashes here.

vllm-canon 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

vllm-canon

Install

Use

What the plugin does

Limitations

Compatibility

Parity

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes