'Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.'

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

FastServe

Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.

YouTube: How to serve your own GPT like LLM in 1 minute with FastServe

Installation

pip install git+https://github.com/aniketmaurya/fastserve.git@main

Run locally

python -m fastserve

Usage/Examples

Serve Mistral-7B with Llama-cpp

from fastserve.models import ServeLlamaCpp

model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()

or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf from terminal.

Serve SDXL Turbo

from fastserve.models import ServeSDXLTurbo

serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1 from terminal.

This application comes with an UI. You can access it at http://localhost:8000/ui .

Face Detection

from fastserve.models import FaceDetection

serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()

or, run python -m fastserve.models --model face-detection --batch_size 2 --timeout 1 from terminal.

Image Classification

from fastserve.models import ServeImageClassification

app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()

or, run python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1 from terminal.

Serve Custom Model

To serve a custom model, you will have to implement handle method for FastServe that processes a batch of inputs and returns the response as a list.

from fastserve import FastServe


class MyModelServing(FastServe):
    def __init__(self):
        super().__init__(batch_size=2, timeout=0.1)
        self.model = create_model(...)

    def handle(self, batch: List[BaseRequest]) -> List[float]:
        inputs = [b.request for b in batch]
        response = self.model(inputs)
        return response


app = MyModelServing()
app.run_server()

You can run the above script in terminal, and it will launch a FastAPI server for your custom model.

Deploy

Lightning AI Studio ⚡️

python fastserve.deploy.lightning --filename main.py \
    --user LIGHTNING_USERNAME \
    --teamspace LIGHTNING_TEAMSPACE \
    --machine "CPU"  # T4, A10G or A10G_X_4

Contribute

Install in editable mode:

git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .

Create a new branch

git checkout -b ＜new-branch＞

Make your changes, commit and create a PR.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.3

Feb 23, 2024

This version

0.0.3a0 pre-release

Feb 16, 2024

0.0.2

Feb 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FastServeAI-0.0.3a0.tar.gz (1.5 MB view hashes)

Uploaded Feb 16, 2024 Source

Built Distribution

FastServeAI-0.0.3a0-py3-none-any.whl (246.8 kB view hashes)

Uploaded Feb 16, 2024 Python 3

Hashes for FastServeAI-0.0.3a0.tar.gz

Hashes for FastServeAI-0.0.3a0.tar.gz
Algorithm	Hash digest
SHA256	`cfec017ac0f429d23d5de76a19ee8774a7a93b66fc51b937880133e5d3bbff18`
MD5	`836ec3af23ca9b284db297a214f370e0`
BLAKE2b-256	`4dd3acb2cf4f500296b859fc3c68ae520dbb641f980165a9e4a27f8cdd8eb0d5`

Hashes for FastServeAI-0.0.3a0-py3-none-any.whl

Hashes for FastServeAI-0.0.3a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7089051e1f938a2ba7c2f1e18afe927ece6ef1586dbe985ebc06be9202bd0c1b`
MD5	`8d78d4e9062392eacc15bc224986ba98`
BLAKE2b-256	`3af29161891d6d4aebdc16d343dca2e85bc81f7dcf8b5418ef6a40410d0bce3e`