Skip to main content

Haystack integration for vllm

Project description

vllm-haystack

PyPI - Version PyPI - Python Version


Contributing

Refer to the general Contribution Guidelines.

To run integration tests locally, you need two vLLM servers running in parallel: one for the chat generator on port 8000 and one for the embedders on port 8001. Refer to the workflow file for more details.

For example, on macOs, you can install vLLM-metal and start the chat generator server with:

# chat generator server (port 8000)
source ~/.venv-vllm-metal/bin/activate && vllm serve Qwen/Qwen3-0.6B --reasoning-parser qwen3 --max-model-len 1024 --enforce-eager --enable-auto-tool-choice --tool-call-parser hermes

vLLM-metal does not support embedding models. On macOS, you can run the embedding server via CPU Docker image:

# embedders server (port 8001)
docker run --rm -p 8001:8000 -e VLLM_CPU_OMP_THREADS_BIND=0-3 vllm/vllm-openai-cpu:latest \
    --model sentence-transformers/all-MiniLM-L6-v2 --enforce-eager

To run the ranker server, use CPU Docker image:

# ranker server (port 8002)
docker run --rm -p 8002:8000 -e VLLM_CPU_OMP_THREADS_BIND=0-3 vllm/vllm-openai-cpu:latest \
    --model BAAI/bge-reranker-base --enforce-eager

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_haystack-1.2.0.tar.gz (26.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_haystack-1.2.0-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file vllm_haystack-1.2.0.tar.gz.

File metadata

  • Download URL: vllm_haystack-1.2.0.tar.gz
  • Upload date:
  • Size: 26.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for vllm_haystack-1.2.0.tar.gz
Algorithm Hash digest
SHA256 618005b7af6c603ecd83682ffe122dd1454039f63a2c6ab2240d70d41633d3a5
MD5 a56eded6752d44646947073c480c6c94
BLAKE2b-256 3e71b37f1278f7a9e6e307eed5b3b33709770e17ecc71906a666f553ad94f1c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllm_haystack-1.2.0.tar.gz:

Publisher: CI_pypi_release.yml on deepset-ai/haystack-core-integrations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vllm_haystack-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: vllm_haystack-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for vllm_haystack-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50a2461c5c94ce9c3bb5164feae1f2d36626f7336a18de0ccab425b6e0f21a1a
MD5 e7be988ef21bf5656d22998634d27093
BLAKE2b-256 284558c47adfee410a8ce2ccb57ced7b7da2c3405d7cba63ea184c5248f5f0a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for vllm_haystack-1.2.0-py3-none-any.whl:

Publisher: CI_pypi_release.yml on deepset-ai/haystack-core-integrations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page