Skip to main content

'Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.'

Project description

FastServe

Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.

img_tag

YouTube: How to serve your own GPT like LLM in 1 minute with FastServe

Installation

pip install git+https://github.com/aniketmaurya/fastserve.git@main

Run locally

python -m fastserve

Usage/Examples

Serve Mistral-7B with Llama-cpp

from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()

or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf from terminal.

Serve SDXL Turbo

from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1 from terminal.

This application comes with an UI. You can access it at http://localhost:8000/ui .

Face Detection

from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model face-detection --batch_size 2 --timeout 1 from terminal.

Image Classification

from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()

or, run python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1 from terminal.

Serve Custom Model

To serve a custom model, you will have to implement handle method for FastServe that processes a batch of inputs and returns the response as a list.

from fastserve import FastServe


class MyModelServing(FastServe):
    def __init__(self):
        super().__init__(batch_size=2, timeout=0.1)
        self.model = create_model(...)

    def handle(self, batch: List[BaseRequest]) -> List[float]:
        inputs = [b.request for b in batch]
        response = self.model(inputs)
        return response


app = MyModelServing()
app.run_server()

You can run the above script in terminal, and it will launch a FastAPI server for your custom model.

Deploy

Lightning AI Studio ⚡️

python fastserve.deploy.lightning --filename main.py \
    --user LIGHTNING_USERNAME \
    --teamspace LIGHTNING_TEAMSPACE \
    --machine "CPU"  # T4, A10G or A10G_X_4

Contribute

Install in editable mode:

git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .

Create a new branch

git checkout -b <new-branch>

Make your changes, commit and create a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastServeAI-0.0.2.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

FastServeAI-0.0.2-py3-none-any.whl (245.4 kB view details)

Uploaded Python 3

File details

Details for the file FastServeAI-0.0.2.tar.gz.

File metadata

  • Download URL: FastServeAI-0.0.2.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for FastServeAI-0.0.2.tar.gz
Algorithm Hash digest
SHA256 901c04d392ac26f5559e11427559e21559c0d1eaaead679cc3f4dc98a885f04c
MD5 f7b025afa09252227f75f56042210b08
BLAKE2b-256 deb563d0865df04f8bfc3aeb8b99ae9793a0bae9e8f67173def6b01a7322e89d

See more details on using hashes here.

File details

Details for the file FastServeAI-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: FastServeAI-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 245.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for FastServeAI-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1564f40b41c4d919882e4569c72c2792e7c095213effc09347feb64ef6119858
MD5 ab393ea4c45f4cbff45044915168a01a
BLAKE2b-256 2a3305fcc42aa2b7402d6c31cfcea991931427db2691ddc4b57cba5a175cb0f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page