An MLOps library for LLM deployment w/ the vLLM engine on RunPod's infra.

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

SuperLaser

⚠️Not yet eady for primetime ⚠️

SuperLaser provides a comprehensive suite of tools and scripts designed for deploying Large Language Models (LLMs) onto RunPod's pod and serverless infrastructure. Additionally, the deployment utilizes the vLLM engine as the backend during runtime, ensuring memory-efficient and high-performance inference capabilities.

While most tutorials emphasize the use of RunPod's console to configure a deployment, this repository offers additional functionalities. It enables users to create templates, configure pods and/or serverless endpoints, and execute API requests programmatically from Python.

Features

Scalable Deployment: Easily scale your LLM inference tasks with RunPod serverless capabilities.
Cost-Effective: Optimize resource usage and reduce costs with serverless architecture.
Easy Integration: Seamless integration with existing LLM workflows.

RunPod Config

First step is to obtain an API key from RunPod. Go to your account's console, in the Settings section, click on API Keys.

After obtaining a key, set it as an environment variable:

export RUNPOD_API_KEY=<YOUR-API-KEY>

Configure Template

Before spinning up a serverless endpoint, let's first configure a template that we'll pass into the endpoint during staging. The template allows you to select a serverless or pod asset, your docker image name, and the container's and volume's disk space.

Configure your template with the following attributes:

import os
from superlaser import RunpodHandler as runpod

api_key = os.environ.get("RUNPOD_API_KEY")

template_data = runpod.set_template(
    serverless="true",                                      # false spins up a pod instead
    template_name="superlaser-inf",                         # Give a name to your template
    container_image="runpod/worker-vllm:0.3.1-cuda12.1.0",  # Docker image stub
    model_name="mistralai/Mistral-7B-v0.1",                 # Hugging Face model stub
    max_model_length=340,                                   # Maximum number of tokens for the engine to handle per request.
    container_disk=15,                                      
    volume_disk=15,
)

Push template to your RunPod account:

template = runpod(api_key=api_key, data=template_data)
print(template().text)

Configure Endpoint

After your template is created, it will return a data dicitionary that includes your template ID. We will pass this template id when configuring the serverless endpoint in the section below:

endpoint_data = runpod.create_serverless_endpoint(
    gpu_ids="AMPERE_24", # options for gpuIds are "AMPERE_16,AMPERE_24,AMPERE_48,AMPERE_80,ADA_24"
    idle_timeout=5,
    name="vllm_endpoint",
    scaler_type="QUEUE_DELAY",
    scaler_value=1,
    template_id="template-id",
    workers_max=1,
    workers_min=0,
)

Boot up your endpoint on RunPod:

endpoint = runpod(api_key=api_key, data=endpoint_data)
print(endpoint().text)

Call Endpoint

After your endpoint is staged, it will return a dictionary with your endpoint ID. Pass this endpoint ID to the SuperLaser client and start making API requests!

superlaser = SuperLaser(endpoint_id="endpoint-id", model_name="mistralai/Mistral-7B-v0.1")
superlaser("Why is SuperLaser awesome?")

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.6

Mar 2, 2024

0.0.5

Mar 2, 2024

0.0.4

Mar 2, 2024

0.0.3

Mar 2, 2024

0.0.2

Mar 2, 2024

This version

0.0.1

Mar 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superlaser-0.0.1.tar.gz (13.0 kB view hashes)

Uploaded Mar 2, 2024 Source

Built Distribution

superlaser-0.0.1-py3-none-any.whl (12.3 kB view hashes)

Uploaded Mar 2, 2024 Python 3

Hashes for superlaser-0.0.1.tar.gz

Hashes for superlaser-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`fbfdfe98dbbc9ba733b8527a960f9a398f22c5a10bf8fd7411373811abb805a6`
MD5	`26692583dd2c9be01e50a87b9434cf32`
BLAKE2b-256	`eb8323042e2b276f930a35a18433f18b8dd8bf65cdd095ec495ece32e1d1841d`

Hashes for superlaser-0.0.1-py3-none-any.whl

Hashes for superlaser-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc854e6d4f45d6df8957858ff9d849fc7a0d2e666f735683b0fa31d295af5df3`
MD5	`6b94d6010966c7c3d17839495865cfdc`
BLAKE2b-256	`f53eb95e53d98d498e3766e1d0125ce25493c687d5f014343440567fd5bab09a`