Skip to main content

Automatic micro-batching for HTTP LLM calls and local PyTorch inference, backed by a Rust core.

Project description

llm-autobatch

Production-minded micro-batching for LLM calls and local PyTorch inference, backed by a single Rust core.

  • Viral simple: @autobatch turns single calls into efficient batches.
  • Adapter-based: swap HTTP or Torch executors without changing the core.
  • Rust-fast: thread-safe queues, micro-windows, and backpressure.

60-second Quickstart

pip install llm-autobatch
from llm_autobatch import autobatch

@autobatch(max_batch=32, max_wait_ms=10)
def call_llm(prompts: list[str]) -> list[str]:
    # Replace with a real batch call
    return [p.upper() for p in prompts]

print(call_llm("hello"))

Object-based API

from llm_autobatch import Batcher

batcher = Batcher(max_batch=32, max_wait_ms=10)

def batch_executor(items: list[str]) -> list[str]:
    return [s + "!" for s in items]

print(batcher.run("hi", executor=batch_executor))

HTTP adapter (OpenAI-style)

from llm_autobatch.http import OpenAIResponsesExecutor
from llm_autobatch import Batcher

executor = OpenAIResponsesExecutor(api_key="...", model="gpt-4o-mini")
batcher = Batcher(max_batch=32, max_wait_ms=10)

out = batcher.run("Explain Rust ownership", executor=executor)
print(out)

Torch adapter

from llm_autobatch.torch import TorchExecutor
from llm_autobatch import Batcher

executor = TorchExecutor(model=model, collate_fn=collate, device="cuda")
batcher = Batcher(max_batch=64, max_wait_ms=5)

print(batcher.run(x, executor=executor))

Benchmark

Run a local throughput test:

python benches/bench_throughput.py

Sample output (illustrative):

items=10000 max_batch=64 max_wait_ms=5  avg_batch=42.7  p99_ms=11.2

Why Rust?

  • Deterministic batching windows without Python GIL bottlenecks
  • Low-latency coordination under high concurrency
  • Single core reused across HTTP and Torch adapters
  • Memory safety while handling multithreaded queues

FAQ

Does this change my model API? No. You keep your executor; the core only handles batching and routing.

Do I need Rust installed? No. We publish prebuilt wheels for macOS, Linux, and Windows. pip install llm-autobatch should work without Rust.

How do I enable HTTP or Torch adapters? Install extras:

pip install llm-autobatch[http]
pip install llm-autobatch[torch]

What does backpressure do?

  • block: wait for space
  • drop: reject when full
  • passthrough: execute immediately

Can I use async? Not in v1. Async support is planned for v1.1.

Is ordering preserved? Yes. Outputs must match the input order for each batch.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_autobatch-0.1.1.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_autobatch-0.1.1-cp39-abi3-macosx_11_0_arm64.whl (287.9 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file llm_autobatch-0.1.1.tar.gz.

File metadata

  • Download URL: llm_autobatch-0.1.1.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for llm_autobatch-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7284a4d5f0d8c8cec6e6efc07ce9b37213f9f463e3298ef36088b93fb43558e2
MD5 fc2ab1bb7315a7212de0d2a27c7282f9
BLAKE2b-256 4a1dc703d55061ffb707a5cc4bb33a960c38c4ba5e735ed2ad838fa1c61f6609

See more details on using hashes here.

File details

Details for the file llm_autobatch-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for llm_autobatch-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7ded2a9df40d111dac181510a48c485b059b815e4119354fb852afc05003cccf
MD5 a2d586d26693a7cd272d364016cda6bd
BLAKE2b-256 0a6dbb61173377179122fc54c9747b07336f619e2a15631dd060b32d78d51799

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page