No project description provided
Project description
Fast Dynamic Batcher
Batching machine learning workloads is the easiest way to achieve significant inference speedups. The Fast Dynamic Batcher Library has been built to add easy support for dynamic batchtes to FastAPI. With our dynamic batcher, you can combine the inputs of several requests into a single batch, which will can then be run on your GPU. With our example project we measured speed-ups of up to 2.8x with dynamic batching compared to a baseline without it.
Example Usage
To use dynamic batching in your FastAPI workloads, you have to first create an instance of the InferenceModel
class. Initiate your ML model in the init
method and use it in the infer
method:
from fast_dynamic_batcher.dyn_batcher import Task
from fast_dynamic_batcher.inference_template import InferenceModel
class DemoModel(InferenceModel):
def __init__(self):
super().__init__()
# Initiate your ML model here
def infer(self, tasks: list[Task]) -> list[Task]:
# Process your input tasks
inputs = [t.content for t in tasks]
# Run your inputs as a batch for your model
ml_output = None # Your inference outputs
results = [
Task(id=tasks[i].id, content=ml_output[i]) for i in range(len(tasks))
]
return results
Subsequently, use your InferenceModel
instance to initiate our DynBatcher
:
from contextlib import asynccontextmanager
from anyio import CapacityLimiter
from anyio.lowlevel import RunVar
from fast_dynamic_batcher.dyn_batcher import DynBatcher
@asynccontextmanager
async def lifespan(app: FastAPI):
RunVar("_default_thread_limiter").set(CapacityLimiter(16))
global dyn_batcher
dyn_batcher = DynBatcher(DemoModel, max_batch_size = 8, max_delay = 0.1)
yield
dyn_batcher.stop()
app = FastAPI(lifespan=lifespan)
@app.post("/predict/")
async def predict(
input_model: YourInputPydanticModel
):
return await dyn_batcher.process_batched(input_model)
The DynBatcher
can be initiated in the FastAPI lifespans as a global variable. It can be further customized with the max_batch_size
andmax_delay
variables. Subsequently, it can be used in your FastAPI endpoints by registering your inputs by calling its process_batched
method.
Our dynamic batching algorithm will then wait for either the number of inputs to equal max_batch_size
, or until max_delay
seconds have passed. In the latter case, a batch may contain between 1 and max_batch_size
inputs. Once, either condition is met, a batch will be processed by calling the infer
method of your InferenceModel
instance.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fast_dynamic_batcher-0.1.1.tar.gz
.
File metadata
- Download URL: fast_dynamic_batcher-0.1.1.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30d2869d902a3030d4fc15fe90abe0e388f37c5b60ec400ea83188f98b0e925d |
|
MD5 | 68600375c9bf61c6077ec4516d829545 |
|
BLAKE2b-256 | aaa064b748553d903862dcafae3d30feade7fca5bd36483b2583712e45241354 |
File details
Details for the file fast_dynamic_batcher-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: fast_dynamic_batcher-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.5.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 802368c2b31a2d9c8614838a49d4eef2b4aaf67ec6151968a76b6f9d14eb296f |
|
MD5 | 68f3e8052712c87c13a5301eb305fe61 |
|
BLAKE2b-256 | 892f08c1ccfa555d9ea22bc92285971cb25fe34c4a6b5c381be8e5b6e1dba97a |