Skip to main content

An MLOps library for LLM deployment w/ the vLLM engine on RunPod's infra.

Project description

SuperLaser

⚠️Not yet ready for primetime ⚠️

SuperLaser provides a comprehensive suite of tools and scripts designed for deploying LLMs onto RunPod's pod and serverless infrastructure. Additionally, the deployment utilizes a containerized vLLM engine during runtime, ensuring memory-efficient and high-performance inference capabilities.

Features

  • Scalable Deployment: Easily scale your LLM inference tasks with vLLM and RunPod serverless capabilities.
  • Cost-Effective: Optimize resource and hardware usage: tensor parallelism and other GPU assets.
  • Uses OpenAI's API: Use the SuperLaser client for with chat, non-chat, and streaming options.

Install

pip install superlaser

Before you begin, ensure you have:

  • A RunPod account.

RunPod Config

First step is to obtain an API key from RunPod. Go to your account's console, in the Settings section, click on API Keys.

After obtaining a key, set it as an environment variable:

export RUNPOD_API_KEY=<YOUR-API-KEY>

Configure Template

Before spinning up a serverless endpoint, let's first configure a template that we'll pass into the endpoint during staging. The template allows you to select a serverless or pod asset, your docker image name, and the container's and volume's disk space.

Configure your template with the following attributes:

import os
from superlaser import RunpodHandler as runpod

api_key = os.environ.get("RUNPOD_API_KEY")

template_data = runpod.set_template(
    serverless="true",                                      
    template_name="superlaser-inf",                         # Give a name to your template
    container_image="runpod/worker-vllm:0.3.1-cuda12.1.0",  # Docker image stub
    model_name="mistralai/Mistral-7B-v0.1",                 # Hugging Face model stub
    max_model_length=340,                                   # Maximum number of tokens for the engine to handle per request.
    container_disk=15,                                      
    volume_disk=15,
)

Create Template on RunPod

template = runpod(api_key, data=template_data)
print(template().text)

Configure Endpoint

After your template is created, it will return a data dicitionary that includes your template ID. We will pass this template id when configuring the serverless endpoint in the section below:

endpoint_data = runpod.set_endpoint(
    gpu_ids="AMPERE_24", # options for gpuIds are "AMPERE_16,AMPERE_24,AMPERE_48,AMPERE_80,ADA_24"
    idle_timeout=5,
    name="vllm_endpoint",
    scaler_type="QUEUE_DELAY",
    scaler_value=1,
    template_id="template-id",
    workers_max=1,
    workers_min=0,
)

Start Endpoint on RunPod

endpoint = runpod(api_key, data=endpoint_data)
print(endpoint().text)

Call Endpoint

After your endpoint is staged, it will return a dictionary with your endpoint ID. Pass this endpoint ID to the OpenAI client and start making API requests!

from openai import OpenAI

endpoint_id = "you-endpoint-id"

client = OpenAI(
    api_key=api_key,
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1",
)

Chat w/ Streaming

stream = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.1",
    messages=[{"role": "user", "content": "To be or not to be"}],
    temperature=0,
    max_tokens=100,
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Completion w/ Streaming

stream = client.completions.create(
    model="meta-llama/Llama-2-7b-hf",
    prompt="To be or not to be",
    temperature=0,
    max_tokens=100,
    stream=True,
)

for response in stream:
    print(response.choices[0].text or "", end="", flush=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superlaser-0.0.6.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

superlaser-0.0.6-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file superlaser-0.0.6.tar.gz.

File metadata

  • Download URL: superlaser-0.0.6.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for superlaser-0.0.6.tar.gz
Algorithm Hash digest
SHA256 4812ab3cc423903021853ef1bd8373a54cc68078b4bb085d8bb29daed12ebd26
MD5 6d0ce9414cacf5c9cdfa5bc6c76dcaca
BLAKE2b-256 f2b71d55b8ec9d802199d8b25397002b60155e0b453cc64b8d3f5b8056ddcae5

See more details on using hashes here.

File details

Details for the file superlaser-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: superlaser-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for superlaser-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 4defd824787c03c8c146b13736999a9ebadecca3b79c4e33411d2b7c4b014e0f
MD5 12f30a32c161a40e4f7719d45b8297c7
BLAKE2b-256 fef22fed79e68cf12918f12ea761227bd1a4161a2c541a5d0795e01af0431b6b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page