Skip to main content

Drop-in AsyncOpenAI replacement that transparently batches requests using the batch API

Project description

autobatcher

Drop-in replacement for AsyncOpenAI that transparently batches requests. This library is designed or use with the Doubleword Batch API. Support for OpenAI's batch API or other compatible APIs is best effort. If you experience any issues, please open an issue.

Why?

Batch LLM APIs offers 50% cost savings (and specialist inference providers like Doubleword offer 80%+ savings), but these APIs you to restructure your code around file uploads and polling. autobatcher lets you keep your existing async code while getting batch pricing automatically.

# Before: regular async calls (full price)
from openai import AsyncOpenAI
client = AsyncOpenAI()

# After: batched calls (50% off)
from autobatcher import BatchOpenAI
client = BatchOpenAI()

# Same interface, same code
response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

How it works

  1. Requests are collected over a configurable time window (default: 10 seconds)
  2. When the window closes or batch size is reached, requests are submitted as a batch
  3. Results are polled and returned to waiting callers as they complete
  4. Your code sees normal response objects (ChatCompletion, CreateEmbeddingResponse, Response)

Different request types (chat completions, embeddings, responses) can be mixed in a single batch — each result is parsed with the correct type automatically.

Installation

pip install autobatcher

Usage

Chat completions

import asyncio
from autobatcher import BatchOpenAI

async def main():
    client = BatchOpenAI(
        api_key="sk-...",  # or set OPENAI_API_KEY env var
    )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "What is 2+2?"}],
    )
    print(response.choices[0].message.content)

    await client.close()

asyncio.run(main())

Embeddings

async def embed(client: BatchOpenAI):
    response = await client.embeddings.create(
        model="text-embedding-3-small",
        input="Hello, world!",
    )
    print(response.data[0].embedding[:5])

Responses API

async def respond(client: BatchOpenAI):
    response = await client.responses.create(
        model="gpt-4o",
        input="Explain quantum computing in one sentence.",
    )
    print(response.output[0].content[0].text)

Parallel requests

The real power comes when you have many requests:

async def process_many(prompts: list[str]) -> list[str]:
    client = BatchOpenAI(batch_size=500, batch_window_seconds=5.0)

    async def get_response(prompt: str) -> str:
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
        )
        return response.choices[0].message.content

    # All requests are batched together automatically
    results = await asyncio.gather(*[get_response(p) for p in prompts])

    await client.close()
    return results

Mixed batching

Different request types are automatically mixed into the same batch:

async def mixed(client: BatchOpenAI):
    chat, embedding = await asyncio.gather(
        client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Hello!"}],
        ),
        client.embeddings.create(
            model="text-embedding-3-small",
            input="Hello!",
        ),
    )

Context manager

async with BatchOpenAI() as client:
    response = await client.chat.completions.create(...)

Configuration

Parameter Default Description
api_key None OpenAI API key (falls back to OPENAI_API_KEY env var)
base_url None API base URL (for proxies or compatible APIs)
batch_size 1000 Submit batch when this many requests are queued
batch_window_seconds 10.0 Submit batch after this many seconds
poll_interval_seconds 5.0 How often to poll for batch completion
completion_window "24h" Batch completion window ("24h" or "1h")

Supported endpoints

Endpoint Method Return type
client.chat.completions.create() Chat completions ChatCompletion
client.embeddings.create() Embeddings CreateEmbeddingResponse
client.responses.create() Responses API Response

Limitations

  • Batch API has a 24-hour completion window by default. 1hr SLAs is also offered with Doubleword.
  • No escalations when the completion window elapses
  • Not suitable for real-time/interactive use cases

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autobatcher-0.3.1.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autobatcher-0.3.1-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file autobatcher-0.3.1.tar.gz.

File metadata

  • Download URL: autobatcher-0.3.1.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autobatcher-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c4ae4dfb931a0896048c6ad99f304dbdfbcfe0b4aa634193fac3a5e8972ee234
MD5 2b6163f8a7295372d3a58b76dae743b4
BLAKE2b-256 6e351c126f66ba61e53929429881f44bf142e371d3d57873754d17329a636573

See more details on using hashes here.

Provenance

The following attestation bundles were made for autobatcher-0.3.1.tar.gz:

Publisher: publish.yml on doublewordai/autobatcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file autobatcher-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: autobatcher-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for autobatcher-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 14cd17629b789f36e14237eedff14c1c12e2fb0e50b71a7a5d5f8e332e707970
MD5 cfe0a4344793341610c14a8e2079fbf5
BLAKE2b-256 2cd85cf15de72ead180b609df3080b9e9e4f5049f73f884ff0fc3b9bf127a33d

See more details on using hashes here.

Provenance

The following attestation bundles were made for autobatcher-0.3.1-py3-none-any.whl:

Publisher: publish.yml on doublewordai/autobatcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page