Skip to main content

'Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.'

Project description

FastServe

Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.

img_tag

YouTube: How to serve your own GPT like LLM in 1 minute with FastServe

Installation

Stable:

pip install FastServeAI

Latest:

pip install git+https://github.com/aniketmaurya/fastserve.git@main

Run locally

python -m fastserve

Usage/Examples

Serve LLMs with Llama-cpp

from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()

or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf from terminal.

Serve vLLM

from fastserve.models import ServeVLLM

app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
app.run_server()

You can use the FastServe client that will automatically apply chat template for you -

from fastserve.client import vLLMClient
from rich import print

client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
# print(client.context)
print(response["outputs"][0]["text"])

Serve SDXL Turbo

from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1 from terminal.

This application comes with an UI. You can access it at http://localhost:8000/ui .

Face Detection

from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model face-detection --batch_size 2 --timeout 1 from terminal.

Image Classification

from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()

or, run python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1 from terminal.

Serve Custom Model

To serve a custom model, you will have to implement handle method for FastServe that processes a batch of inputs and returns the response as a list.

from fastserve import FastServe


class MyModelServing(FastServe):
    def __init__(self):
        super().__init__(batch_size=2, timeout=0.1)
        self.model = create_model(...)

    def handle(self, batch: List[BaseRequest]) -> List[float]:
        inputs = [b.request for b in batch]
        response = self.model(inputs)
        return response


app = MyModelServing()
app.run_server()

You can run the above script in terminal, and it will launch a FastAPI server for your custom model.

Deploy

Lightning AI Studio ⚡️

python fastserve.deploy.lightning --filename main.py \
    --user LIGHTNING_USERNAME \
    --teamspace LIGHTNING_TEAMSPACE \
    --machine "CPU"  # T4, A10G or A10G_X_4

Contribute

Install in editable mode:

git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .

Create a new branch

git checkout -b <new-branch>

Make your changes, commit and create a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastServeAI-0.0.3.tar.gz (1.5 MB view hashes)

Uploaded Source

Built Distribution

FastServeAI-0.0.3-py3-none-any.whl (247.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page