Skip to main content

Batch processing for Anthropic's Claude API with structured output

Project description

AI Batch

Python SDK for batch processing with structured output and citation mapping.

  • 50% cost savings via Anthropic's batch API pricing
  • Structured output with Pydantic models
  • Field-level citations map results to source documents
  • Type safety with full validation

Currently supports Anthropic Claude. OpenAI support coming soon.

API Reference

Quick Start

from ai_batch import batch_files
from pydantic import BaseModel

class Invoice(BaseModel):
    company_name: str
    total_amount: str
    date: str

# Process PDFs with structured output + citations
job = batch_files(
    files=["invoice1.pdf", "invoice2.pdf", "invoice3.pdf"],
    prompt="Extract the company name, total amount, and date.",
    model="claude-3-5-sonnet-20241022",
    response_model=Invoice,
    enable_citations=True
)

# Wait for completion
while not job.is_complete():
    time.sleep(30)
    
results = job.results()
citations = job.citations()

Installation

pip install ai-batch

Usage

Create a .env file in your project root:

ANTHROPIC_API_KEY=your-api-key

API Functions

batch()

Process multiple message conversations with optional structured output.

from ai_batch import batch
from pydantic import BaseModel

class SpamResult(BaseModel):
    is_spam: bool
    confidence: float
    reason: str

# Process messages
job = batch(
    messages=[
        [{"role": "user", "content": "Is this spam? You've won $1000!"}],
        [{"role": "user", "content": "Meeting at 3pm tomorrow"}],
        [{"role": "user", "content": "URGENT: Click here now!"}]
    ],
    model="claude-3-haiku-20240307",
    response_model=SpamResult
)

# Get results
results = job.results()

Response:

[
    SpamResult(is_spam=True, confidence=0.95, reason="Contains monetary prize claim"),
    SpamResult(is_spam=False, confidence=0.98, reason="Normal meeting reminder"),
    SpamResult(is_spam=True, confidence=0.92, reason="Urgent call-to-action pattern")
]

batch_files()

Process PDF files with optional structured output and citations.

from ai_batch import batch_files
from pydantic import BaseModel

class Invoice(BaseModel):
    company_name: str
    total_amount: str
    date: str

# Process PDFs with citations
job = batch_files(
    files=["invoice1.pdf", "invoice2.pdf"],
    prompt="Extract the company name, total amount, and date.",
    model="claude-3-5-sonnet-20241022",
    response_model=Invoice,
    enable_citations=True
)

results = job.results()
citations = job.citations()

Response:

# Results
[
    Invoice(company_name="TechCorp Solutions", total_amount="$12,500.00", date="March 15, 2024"),
    Invoice(company_name="DataFlow Systems", total_amount="$8,750.00", date="March 18, 2024")
]

# Citations (field-level mapping)
[
    {
        "company_name": [Citation(cited_text="TechCorp Solutions", start_page=1)],
        "total_amount": [Citation(cited_text="TOTAL: $12,500.00", start_page=2)],
        "date": [Citation(cited_text="Date: March 15, 2024", start_page=1)]
    },
    {
        "company_name": [Citation(cited_text="DataFlow Systems", start_page=1)],
        "total_amount": [Citation(cited_text="Total Due: $8,750.00", start_page=3)],
        "date": [Citation(cited_text="Invoice Date: March 18, 2024", start_page=1)]
    }
]

BatchJob

The job object returned by batch() and batch_files().

# Check completion status
if job.is_complete():
    results = job.results()

# Get processing statistics
stats = job.stats(print_stats=True)
# Output: {'batch_id': 'abc123', 'status': 'completed', 'total': 3, 'succeeded': 3, ...}

# Get citations (if enabled)
citations = job.citations()

# Save raw API responses
job = batch(..., raw_results_dir="./raw_responses")

Citations

Citations work in two modes depending on whether you use structured output:

1. Text + Citations (Flat List)

When enable_citations=True without a response model, citations are returned as a flat list:

job = batch_files(
    files=["document.pdf"],
    prompt="Summarize the key findings",
    enable_citations=True
)

results = job.results()   # List of strings
citations = job.citations()  # Flat list of Citation objects

# Example citations:
[
    Citation(cited_text="AI reduces errors by 30%", start_page=2),
    Citation(cited_text="Implementation cost: $50,000", start_page=5)
]

2. Structured + Field Citations (Mapping)

When using both response_model and enable_citations=True, citations are mapped to specific fields:

job = batch_files(
    files=["document.pdf"],
    prompt="Extract the data",
    response_model=MyModel,
    enable_citations=True
)

results = job.results()   # List of Pydantic models
citations = job.citations()  # List of dicts mapping fields to citations

# Example field-level citations:
[
    {
        "title": [Citation(cited_text="Annual Report 2024", start_page=1)],
        "revenue": [Citation(cited_text="Revenue: $1.2M", start_page=3)],
        "growth": [Citation(cited_text="YoY Growth: 25%", start_page=3)]
    }
]

The field mapping allows you to trace exactly which part of the source document was used to populate each field in your structured output.

Example Scripts

Limitations

  • Citationm mapping only work with flat Pydantic models (no nested models)
  • No support for OpenAI.
  • PDFs require Opus/Sonnet models for best results
  • Batch jobs can take up to 24 hours to process
  • Use job.is_complete() to check status before getting results
  • Citations may not be available in all batch API responses

License

MIT

Todos

  • Add pricing metadata and max_spend controls
  • Auto batch manager (parallel batches, retry, spend control)
  • Test mode to run on 1% sample before full batch
  • Quick batch - split into smaller chunks for faster results
  • Support text/other file types (not just PDFs)
  • Support for OpenAI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_batch-0.0.2.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_batch-0.0.2-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file ai_batch-0.0.2.tar.gz.

File metadata

  • Download URL: ai_batch-0.0.2.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ai_batch-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a8f956fb906bdc68367ec7fd0c0d9358d0a1fa8142a98fbfb579b5677269444f
MD5 1cf10d8d7231005ade5549abc89cedfa
BLAKE2b-256 48314be022ab99fe565450e3d2741a5ed4c0aa51a0f39c9925d17cf346907c46

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_batch-0.0.2.tar.gz:

Publisher: publish.yml on agamm/ai-batch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_batch-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ai_batch-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ai_batch-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 46dde17f75439b3a5efc7532d59c9cd4f20fd131e5382aa684c38f8d77625595
MD5 2e3a7d115a3e4f0b65d14e3455e049ae
BLAKE2b-256 231c0833d69c64f07c794fcc3b6e0ed872d888c140af7425cf2cb54c0e11e55f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_batch-0.0.2-py3-none-any.whl:

Publisher: publish.yml on agamm/ai-batch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page