An MLOps library for LLM deployment w/ the vLLM engine on RunPod's infra.
Project description
SuperLaser
⚠️Not yet ready for primetime ⚠️
SuperLaser provides a comprehensive suite of tools and scripts designed for deploying LLMs onto RunPod's pod and serverless infrastructure. Additionally, the deployment utilizes a containerized vLLM engine during runtime, ensuring memory-efficient and high-performance inference capabilities.
Features
- Scalable Deployment: Easily scale your LLM inference tasks with vLLM and RunPod serverless capabilities.
- Cost-Effective: Optimize resource and hardware usage: tensor parallelism and other GPU assets.
- Uses OpenAI's API: Use the SuperLaser client for with chat, non-chat, and streaming options.
Install
pip install superlaser
Before you begin, ensure you have:
- A RunPod account.
RunPod Config
First step is to obtain an API key from RunPod. Go to your account's console, in the Settings
section, click on API Keys
.
After obtaining a key, set it as an environment variable:
export RUNPOD_API_KEY=<YOUR-API-KEY>
Configure Template
Before spinning up a serverless endpoint, let's first configure a template that we'll pass into the endpoint during staging. The template allows you to select a serverless or pod asset, your docker image name, and the container's and volume's disk space.
Configure your template with the following attributes:
import os
from superlaser import RunpodHandler as runpod
api_key = os.environ.get("RUNPOD_API_KEY")
template_data = runpod.set_template(
serverless="true",
template_name="superlaser-inf", # Give a name to your template
container_image="runpod/worker-vllm:0.3.1-cuda12.1.0", # Docker image stub
model_name="mistralai/Mistral-7B-v0.1", # Hugging Face model stub
max_model_length=340, # Maximum number of tokens for the engine to handle per request.
container_disk=15,
volume_disk=15,
)
Create Template on RunPod
template = runpod(api_key, data=template_data)
print(template().text)
Configure Endpoint
After your template is created, it will return a data dicitionary that includes your template ID. We will pass this template id when configuring the serverless endpoint in the section below:
endpoint_data = runpod.set_endpoint(
gpu_ids="AMPERE_24", # options for gpuIds are "AMPERE_16,AMPERE_24,AMPERE_48,AMPERE_80,ADA_24"
idle_timeout=5,
name="vllm_endpoint",
scaler_type="QUEUE_DELAY",
scaler_value=1,
template_id="template-id",
workers_max=1,
workers_min=0,
)
Start Endpoint on RunPod
endpoint = runpod(api_key, data=endpoint_data)
print(endpoint().text)
Call Endpoint
After your endpoint is staged, it will return a dictionary with your endpoint ID. Pass this endpoint ID to the OpenAI
client and start making API requests!
from openai import OpenAI
endpoint_id = "you-endpoint-id"
client = OpenAI(
api_key=api_key,
base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1",
)
Chat w/ Streaming
stream = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[{"role": "user", "content": "To be or not to be"}],
temperature=0,
max_tokens=100,
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Completion w/ Streaming
stream = client.completions.create(
model="meta-llama/Llama-2-7b-hf",
prompt="To be or not to be",
temperature=0,
max_tokens=100,
stream=True,
)
for response in stream:
print(response.choices[0].text or "", end="", flush=True)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file superlaser-0.0.6.tar.gz
.
File metadata
- Download URL: superlaser-0.0.6.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4812ab3cc423903021853ef1bd8373a54cc68078b4bb085d8bb29daed12ebd26 |
|
MD5 | 6d0ce9414cacf5c9cdfa5bc6c76dcaca |
|
BLAKE2b-256 | f2b71d55b8ec9d802199d8b25397002b60155e0b453cc64b8d3f5b8056ddcae5 |
File details
Details for the file superlaser-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: superlaser-0.0.6-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4defd824787c03c8c146b13736999a9ebadecca3b79c4e33411d2b7c4b014e0f |
|
MD5 | 12f30a32c161a40e4f7719d45b8297c7 |
|
BLAKE2b-256 | fef22fed79e68cf12918f12ea761227bd1a4161a2c541a5d0795e01af0431b6b |