Skip to main content

A simple adapter to use vLLM in your Haystack pipelines.

Project description

vLLM-haystack-adapter

PyPI - Version PyPI - Python Version

Simply use vLLM in your haystack pipeline, to utilize fast, self-hosted LLMs.

vLLM Haystack

Installation

Install the wrapper via pip: pip install vllm-haystack

Usage

This integration provides two invocation layers:

  • vLLMInvocationLayer: To use models hosted on a vLLM server (or any other OpenAI compatible server)
  • vLLMLocalInvocationLayer: To use locally hosted vLLM models

Use a Model Hosted on a vLLM Server

To utilize the wrapper the vLLMInvocationLayer has to be used.

Here is a simple example of how a PromptNode can be created with the wrapper.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMInvocationLayer


model = PromptModel(model_name_or_path="", invocation_layer_class=vLLMInvocationLayer, max_length=256, api_key="EMPTY", model_kwargs={
        "api_base" : API, # Replace this with your API-URL
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

The model will be inferred based on the model served on the vLLM server. For more configuration examples, take a look at the unit-tests.

Hosting a vLLM Server

To create an OpenAI-Compatible Server via vLLM you can follow the steps in the Quickstart section of their documentation.

Use a Model Hosted Locally

⚠️To run vLLM locally you need to have vllm installed and a supported GPU.

If you don't want to use an API-Server this wrapper also provides a vLLMLocalInvocationLayer which executes the vLLM on the same node Haystack is running on.

Here is a simple example of how a PromptNode can be created with the vLLMLocalInvocationLayer.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMLocalInvocationLayer

model = PromptModel(model_name_or_path=MODEL, invocation_layer_class=vLLMLocalInvocationLayer, max_length=256, model_kwargs={
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_haystack-0.1.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vllm_haystack-0.1.1-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file vllm_haystack-0.1.1.tar.gz.

File metadata

  • Download URL: vllm_haystack-0.1.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for vllm_haystack-0.1.1.tar.gz
Algorithm Hash digest
SHA256 87126bb7e3159562e0698a05fe8ec115d4e2c6629b562a158fc7b1495c303bfd
MD5 41f93478b2f61668e9396fd4a5ed83de
BLAKE2b-256 fcc8a730f050f37bb3791e6bbea10d8b60d272e7355419b6ecea806db8e29688

See more details on using hashes here.

File details

Details for the file vllm_haystack-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vllm_haystack-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for vllm_haystack-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 82dfb91ae239240ddfa8d83f69454178559429c2290ef613979c3991b67a967c
MD5 e341789e082aaef7e9c2ce7c1165caa6
BLAKE2b-256 d3ae687de09b9b3ae97094c3d7c06f86c5200d97d279e0c495a69c6469748030

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page