Skip to main content

'Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.'

Project description

FastServe

Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.

img_tag

YouTube: How to serve your own GPT like LLM in 1 minute with FastServe

Installation

Stable:

pip install FastServeAI

Latest:

pip install git+https://github.com/aniketmaurya/fastserve.git@main

Run locally

python -m fastserve

Usage/Examples

Serve LLMs with Llama-cpp

from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()

or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf from terminal.

Serve vLLM

from fastserve.models import ServeVLLM

app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
app.run_server()

You can use the FastServe client that will automatically apply chat template for you -

from fastserve.client import vLLMClient
from rich import print

client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
# print(client.context)
print(response["outputs"][0]["text"])

Serve SDXL Turbo

from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1 from terminal.

This application comes with an UI. You can access it at http://localhost:8000/ui .

Face Detection

from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model face-detection --batch_size 2 --timeout 1 from terminal.

Image Classification

from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()

or, run python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1 from terminal.

Serve Custom Model

To serve a custom model, you will have to implement handle method for FastServe that processes a batch of inputs and returns the response as a list.

from fastserve import FastServe


class MyModelServing(FastServe):
    def __init__(self):
        super().__init__(batch_size=2, timeout=0.1)
        self.model = create_model(...)

    def handle(self, batch: List[BaseRequest]) -> List[float]:
        inputs = [b.request for b in batch]
        response = self.model(inputs)
        return response


app = MyModelServing()
app.run_server()

You can run the above script in terminal, and it will launch a FastAPI server for your custom model.

Deploy

Lightning AI Studio ⚡️

python fastserve.deploy.lightning --filename main.py \
    --user LIGHTNING_USERNAME \
    --teamspace LIGHTNING_TEAMSPACE \
    --machine "CPU"  # T4, A10G or A10G_X_4

Contribute

Install in editable mode:

git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .

Create a new branch

git checkout -b <new-branch>

Make your changes, commit and create a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastServeAI-0.0.3.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

FastServeAI-0.0.3-py3-none-any.whl (247.0 kB view details)

Uploaded Python 3

File details

Details for the file FastServeAI-0.0.3.tar.gz.

File metadata

  • Download URL: FastServeAI-0.0.3.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for FastServeAI-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ffb237530f4d45b54216d56927d871975e1b4620208cb352b7c39b34a6411516
MD5 347b70cca13ed5aa8a2109fa1487d472
BLAKE2b-256 58572004d3807d412a71bd95764dffe32ddd64d82aa2782a38a4904006ca7507

See more details on using hashes here.

File details

Details for the file FastServeAI-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: FastServeAI-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 247.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for FastServeAI-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cf5be4bda7d9a41425140dcd31a97635ff2f37251fb6a8d9b5f45f9f3f41a8ac
MD5 f3244f3f3b819198379019710cd1e93b
BLAKE2b-256 3d2c0e5a2f8b9c65c33a6c8061003c9e848458a183d9eb5207b728c11bd9dc90

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page