Fleet-level batch dispatcher for LLM APIs. Pool requests across coroutines, route to provider Batch APIs, save 50% on cost without rewriting your agent loops.
Project description
llmfleet
Fleet-level batch dispatcher for LLM APIs. Pool requests from many coroutines, route to provider Batch APIs (50% discount on Anthropic, OpenAI), keep your sync agent loops working.
pip install llmfleet
Why
Anthropic's Batch API saves 50% on input tokens, but it's terrible for one agent — single requests poll for 90–120s. The right unit of batching isn't one user's turn; it's a fleet of agents' turns pooled together by a layer the user never sees. llmfleet is that layer.
Quick start
import asyncio
from anthropic import AsyncAnthropic
from llmfleet import FleetDispatcher, RoutingPolicy
async def main():
client = AsyncAnthropic()
policy = RoutingPolicy(
sync_max_latency_ms=5_000, # interactive paths stay sync
batch_window_ms=30_000, # otherwise pool for 30s
batch_min_size=10,
batch_max_size=100,
)
async with FleetDispatcher(client, policy=policy) as fleet:
# Tight latency → sync
chat = await fleet.submit(
latency_budget_ms=2_000,
model="claude-sonnet-4-20250514",
max_tokens=200,
messages=[{"role": "user", "content": "Hi"}],
)
# Loose latency → pooled into a batch
graded = await asyncio.gather(*[
fleet.submit(
latency_budget_ms=600_000,
model="claude-sonnet-4-20250514",
max_tokens=200,
messages=[{"role": "user", "content": f"Grade: {essay}"}],
)
for essay in essays
])
asyncio.run(main())
How it works
FleetDispatcher runs a background flusher coroutine. Calls to submit() either:
- Run synchronously if
latency_budget_ms <= policy.sync_max_latency_ms, or - Get queued. When the queue holds
batch_min_sizeitems orbatch_window_mselapses, the flusher submits one Anthropic Message Batch, polls until completion, and dispatches results back to each awaiting coroutine via its Future.
Concurrent submit() calls from independent coroutines automatically share batches.
API
FleetDispatcher(client, policy=None, on_batch_submitted=None)
# Lifecycle: use as async context manager
async with FleetDispatcher(client) as fleet:
response = await fleet.submit(latency_budget_ms=N, **anthropic_messages_create_kwargs)
# Force routing decisions
await fleet.submit_sync(**kwargs)
await fleet.submit_batch(**kwargs)
# Introspection
fleet.stats.sync_calls
fleet.stats.batched_calls
fleet.stats.batches_submitted
Configuration
RoutingPolicy(
sync_max_latency_ms=5_000, # threshold for sync routing
batch_window_ms=30_000, # how long to wait for a batch to fill
batch_min_size=1, # minimum size before flushing early
batch_max_size=100, # hard cap (Anthropic supports up to 10k)
poll_interval_s=2.0, # batch status poll interval
)
What it doesn't do
- Not a router across providers/models for quality. Use a real router.
- Not cross-process pooling — fleet is process-local. Use a shared queue (Redis / SQS) for cross-process.
- Doesn't try to batch tool-call turns where the tool is on the critical path; pass
force_sync=Truefor those.
Status
v0.1.0: Anthropic only. OpenAI Batch API and Bedrock async-invoke are on the roadmap. Patches welcome.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmfleet-0.1.0.tar.gz.
File metadata
- Download URL: llmfleet-0.1.0.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dce0daeaf67b76c549c91644565dc1e760db6564c72c8a64ecc879e72d9777ac
|
|
| MD5 |
876e01acb7b79fb41469c759e2cf8d77
|
|
| BLAKE2b-256 |
fc210d6f913a12a905ebd8846b6a2ffb3cb93d9574c106cc2c61647e105ccc36
|
File details
Details for the file llmfleet-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmfleet-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1760c4d285caf410af5672543ea7504e8a0555633ab7f6fefb0e0804ac94a395
|
|
| MD5 |
82b8c5d7bc58ffac115dd930ea8da766
|
|
| BLAKE2b-256 |
c5c8a0b9eafe6dc56c9e083ca7a0cf7d085efefd2c5bc5217a9aeed10241ae5b
|