Automatic micro-batching for HTTP LLM calls and local PyTorch inference, backed by a Rust core.
Project description
llm-autobatch
Production-minded micro-batching for LLM calls and local PyTorch inference, backed by a single Rust core.
- Viral simple:
@autobatchturns single calls into efficient batches. - Adapter-based: swap HTTP or Torch executors without changing the core.
- Rust-fast: thread-safe queues, micro-windows, and backpressure.
60-second Quickstart
pip install llm-autobatch
from llm_autobatch import autobatch
@autobatch(max_batch=32, max_wait_ms=10)
def call_llm(prompts: list[str]) -> list[str]:
# Replace with a real batch call
return [p.upper() for p in prompts]
print(call_llm("hello"))
Object-based API
from llm_autobatch import Batcher
batcher = Batcher(max_batch=32, max_wait_ms=10)
def batch_executor(items: list[str]) -> list[str]:
return [s + "!" for s in items]
print(batcher.run("hi", executor=batch_executor))
HTTP adapter (OpenAI-style)
from llm_autobatch.http import OpenAIResponsesExecutor
from llm_autobatch import Batcher
executor = OpenAIResponsesExecutor(api_key="...", model="gpt-4o-mini")
batcher = Batcher(max_batch=32, max_wait_ms=10)
out = batcher.run("Explain Rust ownership", executor=executor)
print(out)
Torch adapter
from llm_autobatch.torch import TorchExecutor
from llm_autobatch import Batcher
executor = TorchExecutor(model=model, collate_fn=collate, device="cuda")
batcher = Batcher(max_batch=64, max_wait_ms=5)
print(batcher.run(x, executor=executor))
Benchmark
Run a local throughput test:
python benches/bench_throughput.py
Sample output (illustrative):
items=10000 max_batch=64 max_wait_ms=5 avg_batch=42.7 p99_ms=11.2
Why Rust?
- Deterministic batching windows without Python GIL bottlenecks
- Low-latency coordination under high concurrency
- Single core reused across HTTP and Torch adapters
- Memory safety while handling multithreaded queues
FAQ
Does this change my model API? No. You keep your executor; the core only handles batching and routing.
Do I need Rust installed?
No. We publish prebuilt wheels for macOS, Linux, and Windows. pip install llm-autobatch should work without Rust.
How do I enable HTTP or Torch adapters? Install extras:
pip install llm-autobatch[http]
pip install llm-autobatch[torch]
What does backpressure do?
block: wait for spacedrop: reject when fullpassthrough: execute immediately
Can I use async? Not in v1. Async support is planned for v1.1.
Is ordering preserved? Yes. Outputs must match the input order for each batch.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_autobatch-0.1.1.tar.gz.
File metadata
- Download URL: llm_autobatch-0.1.1.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7284a4d5f0d8c8cec6e6efc07ce9b37213f9f463e3298ef36088b93fb43558e2
|
|
| MD5 |
fc2ab1bb7315a7212de0d2a27c7282f9
|
|
| BLAKE2b-256 |
4a1dc703d55061ffb707a5cc4bb33a960c38c4ba5e735ed2ad838fa1c61f6609
|
File details
Details for the file llm_autobatch-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: llm_autobatch-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 287.9 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ded2a9df40d111dac181510a48c485b059b815e4119354fb852afc05003cccf
|
|
| MD5 |
a2d586d26693a7cd272d364016cda6bd
|
|
| BLAKE2b-256 |
0a6dbb61173377179122fc54c9747b07336f619e2a15631dd060b32d78d51799
|