Skip to main content

A simple adapter to use vLLM in your Haystack pipelines.

Project description

vLLM-haystack-adapter

PyPI - Version PyPI - Python Version

Simply use vLLM in your haystack pipeline, to utilize fast, self-hosted LLMs.

vLLM Haystack

Installation

Install the wrapper via pip: pip install vllm-haystack

Usage

This integration provides two invocation layers:

  • vLLMInvocationLayer: To use models hosted on a vLLM server (or any other OpenAI compatible server)
  • vLLMLocalInvocationLayer: To use locally hosted vLLM models

Use a Model Hosted on a vLLM Server

To utilize the wrapper the vLLMInvocationLayer has to be used.

Here is a simple example of how a PromptNode can be created with the wrapper.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMInvocationLayer


model = PromptModel(model_name_or_path="", invocation_layer_class=vLLMInvocationLayer, max_length=256, api_key="EMPTY", model_kwargs={
        "api_base" : API, # Replace this with your API-URL
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

The model will be inferred based on the model served on the vLLM server. For more configuration examples, take a look at the unit-tests.

Hosting a vLLM Server

To create an OpenAI-Compatible Server via vLLM you can follow the steps in the Quickstart section of their documentation.

Use a Model Hosted Locally

⚠️To run vLLM locally you need to have vllm installed and a supported GPU.

If you don't want to use an API-Server this wrapper also provides a vLLMLocalInvocationLayer which executes the vLLM on the same node Haystack is running on.

Here is a simple example of how a PromptNode can be created with the vLLMLocalInvocationLayer.

from haystack.nodes import PromptNode, PromptModel
from vllm_haystack import vLLMLocalInvocationLayer

model = PromptModel(model_name_or_path=MODEL, invocation_layer_class=vLLMLocalInvocationLayer, max_length=256, model_kwargs={
        "maximum_context_length": 2048,
    })

prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vllm_haystack-0.1.2.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

vllm_haystack-0.1.2-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file vllm_haystack-0.1.2.tar.gz.

File metadata

  • Download URL: vllm_haystack-0.1.2.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for vllm_haystack-0.1.2.tar.gz
Algorithm Hash digest
SHA256 530015c6518a60114ce02a534addd09dfa035f69dc88d63e1fafd7c43364a692
MD5 86357bb0ef46c7699ea85df9d106332f
BLAKE2b-256 1635df875e43b92273f0b9157f029bf6747f674d51da39bb64e65af7f392a7b9

See more details on using hashes here.

File details

Details for the file vllm_haystack-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for vllm_haystack-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 93993735949ab201a70802e31d9d2aad6a98c2c3618bca990e00d829c0868950
MD5 d34f78e43441b7647c5b0c8da81dde66
BLAKE2b-256 4a24b561ec26e5b4b7aeb54c851157f624b2aa4c739948f6d41a9c2734bdfc96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page