Skip to main content

'Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.'

Project description

FastServe

Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.

img_tag

YouTube: How to serve your own GPT like LLM in 1 minute with FastServe

Installation

pip install git+https://github.com/aniketmaurya/fastserve.git@main

Run locally

python -m fastserve

Usage/Examples

Serve Mistral-7B with Llama-cpp

from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()

or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf from terminal.

Serve SDXL Turbo

from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1 from terminal.

This application comes with an UI. You can access it at http://localhost:8000/ui .

Face Detection

from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model face-detection --batch_size 2 --timeout 1 from terminal.

Image Classification

from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()

or, run python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1 from terminal.

Serve Custom Model

To serve a custom model, you will have to implement handle method for FastServe that processes a batch of inputs and returns the response as a list.

from fastserve import FastServe


class MyModelServing(FastServe):
    def __init__(self):
        super().__init__(batch_size=2, timeout=0.1)
        self.model = create_model(...)

    def handle(self, batch: List[BaseRequest]) -> List[float]:
        inputs = [b.request for b in batch]
        response = self.model(inputs)
        return response


app = MyModelServing()
app.run_server()

You can run the above script in terminal, and it will launch a FastAPI server for your custom model.

Deploy

Lightning AI Studio ⚡️

python fastserve.deploy.lightning --filename main.py \
    --user LIGHTNING_USERNAME \
    --teamspace LIGHTNING_TEAMSPACE \
    --machine "CPU"  # T4, A10G or A10G_X_4

Contribute

Install in editable mode:

git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .

Create a new branch

git checkout -b <new-branch>

Make your changes, commit and create a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastServeAI-0.0.3a0.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

FastServeAI-0.0.3a0-py3-none-any.whl (246.8 kB view details)

Uploaded Python 3

File details

Details for the file FastServeAI-0.0.3a0.tar.gz.

File metadata

  • Download URL: FastServeAI-0.0.3a0.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for FastServeAI-0.0.3a0.tar.gz
Algorithm Hash digest
SHA256 cfec017ac0f429d23d5de76a19ee8774a7a93b66fc51b937880133e5d3bbff18
MD5 836ec3af23ca9b284db297a214f370e0
BLAKE2b-256 4dd3acb2cf4f500296b859fc3c68ae520dbb641f980165a9e4a27f8cdd8eb0d5

See more details on using hashes here.

File details

Details for the file FastServeAI-0.0.3a0-py3-none-any.whl.

File metadata

  • Download URL: FastServeAI-0.0.3a0-py3-none-any.whl
  • Upload date:
  • Size: 246.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for FastServeAI-0.0.3a0-py3-none-any.whl
Algorithm Hash digest
SHA256 7089051e1f938a2ba7c2f1e18afe927ece6ef1586dbe985ebc06be9202bd0c1b
MD5 8d78d4e9062392eacc15bc224986ba98
BLAKE2b-256 3af29161891d6d4aebdc16d343dca2e85bc81f7dcf8b5418ef6a40410d0bce3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page