Fleet-level batch dispatcher for LLM APIs. Pool requests across coroutines, route to provider Batch APIs, save 50% on cost without rewriting your agent loops.

These details have not been verified by PyPI

Project links

Project description

llmfleet

Fleet-level batch dispatcher for LLM APIs. Pool requests from many coroutines, route to provider Batch APIs (50% discount on Anthropic, OpenAI), keep your sync agent loops working.

pip install llmfleet

Why

Anthropic's Batch API saves 50% on input tokens, but it's terrible for one agent — single requests poll for 90–120s. The right unit of batching isn't one user's turn; it's a fleet of agents' turns pooled together by a layer the user never sees. llmfleet is that layer.

Quick start

import asyncio
from anthropic import AsyncAnthropic
from llmfleet import FleetDispatcher, RoutingPolicy

async def main():
    client = AsyncAnthropic()
    policy = RoutingPolicy(
        sync_max_latency_ms=5_000,    # interactive paths stay sync
        batch_window_ms=30_000,       # otherwise pool for 30s
        batch_min_size=10,
        batch_max_size=100,
    )

    async with FleetDispatcher(client, policy=policy) as fleet:
        # Tight latency → sync
        chat = await fleet.submit(
            latency_budget_ms=2_000,
            model="claude-sonnet-4-20250514",
            max_tokens=200,
            messages=[{"role": "user", "content": "Hi"}],
        )

        # Loose latency → pooled into a batch
        graded = await asyncio.gather(*[
            fleet.submit(
                latency_budget_ms=600_000,
                model="claude-sonnet-4-20250514",
                max_tokens=200,
                messages=[{"role": "user", "content": f"Grade: {essay}"}],
            )
            for essay in essays
        ])

asyncio.run(main())

How it works

FleetDispatcher runs a background flusher coroutine. Calls to submit() either:

Run synchronously if latency_budget_ms <= policy.sync_max_latency_ms, or
Get queued. When the queue holds batch_min_size items or batch_window_ms elapses, the flusher submits one Anthropic Message Batch, polls until completion, and dispatches results back to each awaiting coroutine via its Future.

Concurrent submit() calls from independent coroutines automatically share batches.

API

FleetDispatcher(client, policy=None, on_batch_submitted=None)

# Lifecycle: use as async context manager
async with FleetDispatcher(client) as fleet:
    response = await fleet.submit(latency_budget_ms=N, **anthropic_messages_create_kwargs)

# Force routing decisions
await fleet.submit_sync(**kwargs)
await fleet.submit_batch(**kwargs)

# Introspection
fleet.stats.sync_calls
fleet.stats.batched_calls
fleet.stats.batches_submitted

Configuration

RoutingPolicy(
    sync_max_latency_ms=5_000,    # threshold for sync routing
    batch_window_ms=30_000,       # how long to wait for a batch to fill
    batch_min_size=1,             # minimum size before flushing early
    batch_max_size=100,           # hard cap (Anthropic supports up to 10k)
    poll_interval_s=2.0,          # batch status poll interval
)

What it doesn't do

Not a router across providers/models for quality. Use a real router.
Not cross-process pooling — fleet is process-local. Use a shared queue (Redis / SQS) for cross-process.
Doesn't try to batch tool-call turns where the tool is on the critical path; pass force_sync=True for those.

Status

v0.1.0: Anthropic only. OpenAI Batch API and Bedrock async-invoke are on the roadmap. Patches welcome.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmfleet-0.1.0.tar.gz (7.6 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmfleet-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file llmfleet-0.1.0.tar.gz.

File metadata

Download URL: llmfleet-0.1.0.tar.gz
Upload date: May 15, 2026
Size: 7.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmfleet-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dce0daeaf67b76c549c91644565dc1e760db6564c72c8a64ecc879e72d9777ac`
MD5	`876e01acb7b79fb41469c759e2cf8d77`
BLAKE2b-256	`fc210d6f913a12a905ebd8846b6a2ffb3cb93d9574c106cc2c61647e105ccc36`

See more details on using hashes here.

File details

Details for the file llmfleet-0.1.0-py3-none-any.whl.

File metadata

Download URL: llmfleet-0.1.0-py3-none-any.whl
Upload date: May 15, 2026
Size: 7.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for llmfleet-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1760c4d285caf410af5672543ea7504e8a0555633ab7f6fefb0e0804ac94a395`
MD5	`82b8c5d7bc58ffac115dd930ea8da766`
BLAKE2b-256	`c5c8a0b9eafe6dc56c9e083ca7a0cf7d085efefd2c5bc5217a9aeed10241ae5b`

See more details on using hashes here.

llmfleet 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llmfleet

Why

Quick start

How it works

API

Configuration

What it doesn't do

Status

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes