Skip to main content

vLLM out-of-tree model: Jina Embeddings v4 multi-vector (token_embed) on Qwen2.5-VL

Project description

jina-v4-vllm-plugin

vLLM out-of-tree model plugin that makes a stock vLLM OpenAI server serve Jina Embeddings v4 multi-vector (128-dim/token, ColBERT-style late interaction) multimodal (text + image) embeddings. With the plugin installed, the server's /pooling endpoint returns final L2-normalized [n,128] per-token multivectors directly — no proxy, no client-side projection.

It registers a JinaV4MultiVector architecture (Qwen2.5-VL backbone + Jina's multi_vector_projector applied in-engine, mirroring vLLM's in-tree ColQwen3/ColPali pattern) via a vllm.general_plugins entry point, so it loads in every vLLM process including the v1 EngineCore worker.

Install

pip install jina-v4-vllm-plugin        # from PyPI
# into an image that already provides vLLM (e.g. vllm/vllm-openai), skip re-resolving vLLM/torch:
pip install --no-deps jina-v4-vllm-plugin

--no-deps keeps pip from re-resolving vLLM/torch inside the official image. Pin the host vLLM version the plugin was validated against — see research/docs/COMPAT.md.

Use

vllm serve <jina-v4-checkpoint> \
  --runner pooling --pooler-config.task token_embed \
  --hf-overrides '{"architectures":["JinaV4MultiVector"]}' \
  --chat-template "$(python -c 'import jina_v4_vllm_plugin as p; print(p.chat_template_path())')"

The projector weights (128×2048 + bias) are not in the vLLM checkpoint; the plugin loads them at startup from JINA_MV_PROJECTOR (default /artifacts/projector/retrieval.npz), or from the checkpoint itself if baked in. A ready-made baked, drop-in checkpoint is published at Mazyod/jina-embeddings-v4-vllm-mv.

Build & validation tooling

The Modal build/validate/bake/deploy harness that produced and verified the artifacts lives under research/ (its own uv project): projector extraction, checkpoint baking, HF-vs-vLLM parity, the deploy runbook, and the vLLM-version compatibility matrix (research/docs/COMPAT.md, research/deploy/DEPLOY.md).

Develop

make install   # uv sync
make test      # packaging contract tests (no GPU/vLLM)
make build     # sdist + wheel into dist/

Releases publish to PyPI via GitHub Actions Trusted Publishing (OIDC) — run the Publish to PyPI workflow (workflow_dispatch, choose patch/minor/major).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jina_v4_vllm_plugin-0.1.1.tar.gz (316.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jina_v4_vllm_plugin-0.1.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file jina_v4_vllm_plugin-0.1.1.tar.gz.

File metadata

  • Download URL: jina_v4_vllm_plugin-0.1.1.tar.gz
  • Upload date:
  • Size: 316.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jina_v4_vllm_plugin-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2a9c45ed6555563155141ae2b9161097e5c5ba453cf8fed35ac5cefe789e712b
MD5 2f2d0aaab73a9026e0a5904b13332161
BLAKE2b-256 75c062a41361f004aaa77bbf30fff3ac0fbed28eea62a6db0fb0a47ca621a936

See more details on using hashes here.

Provenance

The following attestation bundles were made for jina_v4_vllm_plugin-0.1.1.tar.gz:

Publisher: publish.yml on Mazyod/jina-embeddings-v4-vllm-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file jina_v4_vllm_plugin-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for jina_v4_vllm_plugin-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 98be0623426d3e6d7e8438fcbc14bff0e028973658c912015770edbc3bf3b7a5
MD5 2a0412ef60edaf1a5a7cf5d8bcd20296
BLAKE2b-256 2442bb040fc413d1c180f7385317d02692042d83acef448fb67a0de270e78f37

See more details on using hashes here.

Provenance

The following attestation bundles were made for jina_v4_vllm_plugin-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Mazyod/jina-embeddings-v4-vllm-plugin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page