Skip to main content

Create batch jobs for the OpenAI API with ease.

Project description

OpenBatch: Simplify OpenAI Batch Job Creation

OpenBatch is a lightweight Python utility designed to streamline the creation of JSONL files for the OpenAI Batch API. It provides a type-safe and intuitive interface using Pydantic models to construct requests for the /v1/responses, /v1/chat/completions, and /v1/embeddings endpoints.

For a detailed guide on using OpenBatch, please refer to the OpenBatch Documentation.

The library offers two distinct APIs to fit your workflow:

  • BatchCollector: A high-level, fluent API that mimics the official openai client. It's perfect for adding individual, distinct requests to a batch file with minimal setup.
  • BatchJobManager: A lower-level API designed for programmatically generating large batches of requests from templates and lists of inputs. It's ideal for scalable tasks like classification, data extraction, or bulk embeddings.

Installation

pip install openbatch

Quickstart: The BatchCollector API

The BatchCollector provides the simplest way to get started. You instantiate it with a file path, and then use its methods to add requests one by one. This example showcases calls to the Responses and Embeddings APIs.

from pydantic import BaseModel, Field
from typing import List
from openbatch import BatchCollector, ReasoningConfig

# Define a Pydantic model for structured JSON output
class LogicalAnalysis(BaseModel):
    premise: str
    conclusion: str
    is_valid: bool = Field(description="Whether the conclusion logically follows from the premise.")

# 1. Initialize the collector with the desired output file path
BATCH_FILE = "my_api_batch.jsonl"
collector = BatchCollector(batch_file_path=BATCH_FILE)

# 2. Add a standard request to the Responses API
collector.responses.create(
    custom_id="request-1-response",
    model="gpt-4o",
    instructions="You are a historian. Provide a concise summary.",
    input="What were the main causes of the French Revolution?",
    max_output_tokens=200
)

# 3. Add a structured request using a reasoning model.
# Note: Reasoning models may not support 'temperature', and it is omitted here.
collector.responses.parse(
    custom_id="request-2-reasoning",
    model="gpt-5-mini",  # Hypothetical reasoning model
    text_format=LogicalAnalysis,
    instructions="Analyze the logical argument provided by the user.",
    input="Premise: All birds can fly. A penguin is a bird. Conclusion: Therefore, a penguin can fly.",
    reasoning=ReasoningConfig(effort="high") # Configure the reasoning effort
)

# 4. Add an Embedding request to showcase API breadth
collector.embeddings.create(
    custom_id="request-3-embedding",
    model="text-embedding-3-small",
    inp="OpenBatch simplifies creating batch jobs."
)

print(f"Batch file '{BATCH_FILE}' created successfully.")

Advanced Usage: The BatchJobManager API

For more complex or repetitive tasks, the BatchJobManager is the more appropriate tool. It excels at generating thousands of requests from a single template, for any supported API.

Example 1: Batch Job from a Prompt Template (Responses API)

Imagine you want to generate marketing copy for 10,000 new products. Instead of creating each request manually, you can use a template with the Responses API.

from openbatch import (
    BatchJobManager,
    PromptTemplate,
    Message,
    ResponsesRequest,
    PromptTemplateInputInstance
)

# 1. Define a prompt template with placeholders
copywriting_template = PromptTemplate(
    messages=[
        Message(role="system", content="You are a marketing copywriter. Generate a catchy, two-sentence description."),
        Message(role="user", content="Product: {product_name}, Features: {features}")
    ]
)

# 2. Define the common configuration for all requests
common_request_config = ResponsesRequest(
    model="gpt-4o-mini",
    temperature=0.8,
    max_output_tokens=100
)

# 3. Create a list of input instances
product_instances = [
    PromptTemplateInputInstance(
        id="prod_001",
        prompt_value_mapping={"product_name": "AeroGlide Drone", "features": "4K camera, 30-min flight"}
    ),
    PromptTemplateInputInstance(
        id="prod_002",
        prompt_value_mapping={"product_name": "HydroPure Bottle", "features": "Self-cleaning, insulated steel"}
    ),
    # ... add up to 9,998 more products
]

# 4. Use the manager to generate the batch file
manager = BatchJobManager()
manager.add_templated_instances(
    prompt=copywriting_template,
    common_request=common_request_config,
    input_instances=product_instances,
    save_file_path="copywriting_batch.jsonl"
)

Example 2: Batch Embedding Requests

Similarly, you can easily create a batch job for generating embeddings for a large number of documents.

from openbatch import BatchJobManager, EmbeddingsRequest, EmbeddingInputInstance

# 1. Define the common configuration for all embedding requests
common_embedding_config = EmbeddingsRequest(
    model="text-embedding-3-small",
    dimensions=512
)

# 2. Create a list of input instances
documents_to_embed = [
    EmbeddingInputInstance(id="doc_1", input="The sky is blue."),
    EmbeddingInputInstance(id="doc_2", input="Grass is green."),
    # ... add thousands more documents
]

# 3. Use the manager to generate the batch file
manager = BatchJobManager()
manager.add_embedding_requests(
    inputs=documents_to_embed,
    common_request=common_embedding_config,
    save_file_path="embeddings_batch.jsonl"
)

Configuring the Request

The common_request objects (ResponsesRequest, EmbeddingsRequest, etc.) are Pydantic models that expose all available API parameters. You can configure any parameter by passing it to the constructor.

from openbatch import ResponsesRequest, ReasoningConfig

# Example of a more detailed configuration for the Responses API
detailed_config = ResponsesRequest(
    model="gpt-4o",
    service_tier="flex",
    reasoning=ReasoningConfig(effort="minimal"),
    max_output_tokens=500
)

You can also override any common setting on a per-instance basis by using the instance_request_options field.


What's Next?

OpenBatch helps you create the batch file. The next steps involve using that file with the OpenAI API:

  1. Upload File: Upload your generated .jsonl file to OpenAI.
  2. Create Batch Job: Create a new batch job pointing to your uploaded file.
  3. Retrieve Results: Monitor the job's status and, once completed, download the output file with the results.

For detailed instructions on these steps, please refer to the Official OpenAI Batch API Documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openbatch-0.0.2.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openbatch-0.0.2-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file openbatch-0.0.2.tar.gz.

File metadata

  • Download URL: openbatch-0.0.2.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openbatch-0.0.2.tar.gz
Algorithm Hash digest
SHA256 ab27cf09a7a458d29eb9ef3c782daa2590ab995ccfd81c566b5c894e58cc9089
MD5 0a9a4aa89335a23560809b77ebc78b1e
BLAKE2b-256 e012de25985f998fed376737dc7b2c86d08650ad4991e03e2e9f14de3de527e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for openbatch-0.0.2.tar.gz:

Publisher: publish.yml on daniel-gomm/openbatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openbatch-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: openbatch-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openbatch-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a1842726e9170cc26b8f42afc39e86f0d80d937ca466183704cef26a9c541a7
MD5 5ca49077e87528a70a2915312f19a6f9
BLAKE2b-256 6fce0ed91097b77b5b2b289b853dc22d5b5974cd8a1d1308b445f10e81af2d41

See more details on using hashes here.

Provenance

The following attestation bundles were made for openbatch-0.0.2-py3-none-any.whl:

Publisher: publish.yml on daniel-gomm/openbatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page