vLLM out-of-tree model: Jina Embeddings v4 multi-vector (token_embed) on Qwen2.5-VL
Project description
jina-v4-vllm-plugin
vLLM out-of-tree model plugin that makes a stock vLLM OpenAI server serve
Jina Embeddings v4 multi-vector (128-dim/token, ColBERT-style late interaction) multimodal
(text + image) embeddings. With the plugin installed, the server's /pooling endpoint returns final
L2-normalized [n,128] per-token multivectors directly — no proxy, no client-side projection.
It registers a JinaV4MultiVector architecture (Qwen2.5-VL backbone + Jina's multi_vector_projector
applied in-engine, mirroring vLLM's in-tree ColQwen3/ColPali pattern) via a vllm.general_plugins
entry point, so it loads in every vLLM process including the v1 EngineCore worker.
Install
pip install jina-v4-vllm-plugin # from PyPI
# into an image that already provides vLLM (e.g. vllm/vllm-openai), skip re-resolving vLLM/torch:
pip install --no-deps jina-v4-vllm-plugin
--no-deps keeps pip from re-resolving vLLM/torch inside the official image. Pin the host vLLM
version the plugin was validated against — see research/docs/COMPAT.md.
Use
vllm serve <jina-v4-checkpoint> \
--runner pooling --pooler-config.task token_embed \
--hf-overrides '{"architectures":["JinaV4MultiVector"]}' \
--chat-template "$(python -c 'import jina_v4_vllm_plugin as p; print(p.chat_template_path())')"
The projector weights (128×2048 + bias) are not in the vLLM checkpoint; the plugin loads them
at startup from JINA_MV_PROJECTOR (default /artifacts/projector/retrieval.npz), or from the
checkpoint itself if baked in. A ready-made baked, drop-in checkpoint is published at
Mazyod/jina-embeddings-v4-vllm-mv.
Build & validation tooling
The Modal build/validate/bake/deploy harness that produced and verified the artifacts lives under
research/ (its own uv project): projector extraction, checkpoint baking, HF-vs-vLLM
parity, the deploy runbook, and the vLLM-version compatibility matrix
(research/docs/COMPAT.md, research/deploy/DEPLOY.md).
Develop
make install # uv sync
make test # packaging contract tests (no GPU/vLLM)
make build # sdist + wheel into dist/
Releases publish to PyPI via GitHub Actions Trusted Publishing (OIDC) — run the Publish to PyPI
workflow (workflow_dispatch, choose patch/minor/major).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jina_v4_vllm_plugin-0.1.1.tar.gz.
File metadata
- Download URL: jina_v4_vllm_plugin-0.1.1.tar.gz
- Upload date:
- Size: 316.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a9c45ed6555563155141ae2b9161097e5c5ba453cf8fed35ac5cefe789e712b
|
|
| MD5 |
2f2d0aaab73a9026e0a5904b13332161
|
|
| BLAKE2b-256 |
75c062a41361f004aaa77bbf30fff3ac0fbed28eea62a6db0fb0a47ca621a936
|
Provenance
The following attestation bundles were made for jina_v4_vllm_plugin-0.1.1.tar.gz:
Publisher:
publish.yml on Mazyod/jina-embeddings-v4-vllm-plugin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jina_v4_vllm_plugin-0.1.1.tar.gz -
Subject digest:
2a9c45ed6555563155141ae2b9161097e5c5ba453cf8fed35ac5cefe789e712b - Sigstore transparency entry: 1787832124
- Sigstore integration time:
-
Permalink:
Mazyod/jina-embeddings-v4-vllm-plugin@443ada7665953d93b577535005ab68261f293c70 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mazyod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@443ada7665953d93b577535005ab68261f293c70 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file jina_v4_vllm_plugin-0.1.1-py3-none-any.whl.
File metadata
- Download URL: jina_v4_vllm_plugin-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98be0623426d3e6d7e8438fcbc14bff0e028973658c912015770edbc3bf3b7a5
|
|
| MD5 |
2a0412ef60edaf1a5a7cf5d8bcd20296
|
|
| BLAKE2b-256 |
2442bb040fc413d1c180f7385317d02692042d83acef448fb67a0de270e78f37
|
Provenance
The following attestation bundles were made for jina_v4_vllm_plugin-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on Mazyod/jina-embeddings-v4-vllm-plugin
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jina_v4_vllm_plugin-0.1.1-py3-none-any.whl -
Subject digest:
98be0623426d3e6d7e8438fcbc14bff0e028973658c912015770edbc3bf3b7a5 - Sigstore transparency entry: 1787832217
- Sigstore integration time:
-
Permalink:
Mazyod/jina-embeddings-v4-vllm-plugin@443ada7665953d93b577535005ab68261f293c70 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Mazyod
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@443ada7665953d93b577535005ab68261f293c70 -
Trigger Event:
workflow_dispatch
-
Statement type: