'Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.'
Project description
FastServe
Machine Learning Serving focused on GenAI & LLMs with simplicity as the top priority.
YouTube: How to serve your own GPT like LLM in 1 minute with FastServe
Installation
Stable:
pip install FastServeAI
Latest:
pip install git+https://github.com/aniketmaurya/fastserve.git@main
Run locally
python -m fastserve
Usage/Examples
Serve LLMs with Llama-cpp
from fastserve.models import ServeLlamaCpp
model_path = "openhermes-2-mistral-7b.Q5_K_M.gguf"
serve = ServeLlamaCpp(model_path=model_path, )
serve.run_server()
or, run python -m fastserve.models --model llama-cpp --model_path openhermes-2-mistral-7b.Q5_K_M.gguf
from terminal.
Serve vLLM
from fastserve.models import ServeVLLM
app = ServeVLLM("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
app.run_server()
You can use the FastServe client that will automatically apply chat template for you -
from fastserve.client import vLLMClient
from rich import print
client = vLLMClient("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = client.chat("Write a python function to resize image to 224x224", keep_context=True)
# print(client.context)
print(response["outputs"][0]["text"])
Serve SDXL Turbo
from fastserve.models import ServeSDXLTurbo
serve = ServeSDXLTurbo(device="cuda", batch_size=2, timeout=1)
serve.run_server()
or, run python -m fastserve.models --model sdxl-turbo --batch_size 2 --timeout 1
from terminal.
This application comes with an UI. You can access it at http://localhost:8000/ui .
Face Detection
from fastserve.models import FaceDetection
serve = FaceDetection(batch_size=2, timeout=1)
serve.run_server()
or, run python -m fastserve.models --model face-detection --batch_size 2 --timeout 1
from terminal.
Image Classification
from fastserve.models import ServeImageClassification
app = ServeImageClassification("resnet18", timeout=1, batch_size=4)
app.run_server()
or, run python -m fastserve.models --model image-classification --model_name resnet18 --batch_size 4 --timeout 1
from
terminal.
Serve Custom Model
To serve a custom model, you will have to implement handle
method for FastServe
that processes a batch of inputs and
returns the response as a list.
from fastserve import FastServe
class MyModelServing(FastServe):
def __init__(self):
super().__init__(batch_size=2, timeout=0.1)
self.model = create_model(...)
def handle(self, batch: List[BaseRequest]) -> List[float]:
inputs = [b.request for b in batch]
response = self.model(inputs)
return response
app = MyModelServing()
app.run_server()
You can run the above script in terminal, and it will launch a FastAPI server for your custom model.
Deploy
Lightning AI Studio ⚡️
python fastserve.deploy.lightning --filename main.py \
--user LIGHTNING_USERNAME \
--teamspace LIGHTNING_TEAMSPACE \
--machine "CPU" # T4, A10G or A10G_X_4
Contribute
Install in editable mode:
git clone https://github.com/aniketmaurya/fastserve.git
cd fastserve
pip install -e .
Create a new branch
git checkout -b <new-branch>
Make your changes, commit and create a PR.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file FastServeAI-0.0.3.tar.gz
.
File metadata
- Download URL: FastServeAI-0.0.3.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffb237530f4d45b54216d56927d871975e1b4620208cb352b7c39b34a6411516 |
|
MD5 | 347b70cca13ed5aa8a2109fa1487d472 |
|
BLAKE2b-256 | 58572004d3807d412a71bd95764dffe32ddd64d82aa2782a38a4904006ca7507 |
File details
Details for the file FastServeAI-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: FastServeAI-0.0.3-py3-none-any.whl
- Upload date:
- Size: 247.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf5be4bda7d9a41425140dcd31a97635ff2f37251fb6a8d9b5f45f9f3f41a8ac |
|
MD5 | f3244f3f3b819198379019710cd1e93b |
|
BLAKE2b-256 | 3d2c0e5a2f8b9c65c33a6c8061003c9e848458a183d9eb5207b728c11bd9dc90 |