Batch processing for Anthropic's Claude API with structured output

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.12

Project description

AI Batch

Python SDK for batch processing with structured output and citation mapping.

50% cost savings via Anthropic's batch API pricing
Automatic cost tracking with token usage and pricing
Structured output with Pydantic models
Field-level citations map results to source documents
Type safety with full validation

Currently supports Anthropic Claude. OpenAI support coming soon.

API Reference

batch() - Process message conversations or PDF files
BatchJob - Job status and results

Quick Start

from ai_batch import batch
from pydantic import BaseModel

class Invoice(BaseModel):
    company_name: str
    total_amount: str
    date: str

# Process PDFs with structured output + citations
job = batch(
    files=["invoice1.pdf", "invoice2.pdf", "invoice3.pdf"],
    prompt="Extract the company name, total amount, and date.",
    model="claude-3-5-sonnet-20241022",
    response_model=Invoice,
    enable_citations=True
)

# Wait for completion
while not job.is_complete():
    time.sleep(30)
    
results = job.results()
citations = job.citations()

Installation

pip install ai-batch

Usage

Create a .env file in your project root:

ANTHROPIC_API_KEY=your-api-key

API Functions

batch()

Process multiple message conversations with optional structured output.

from ai_batch import batch
from pydantic import BaseModel

class SpamResult(BaseModel):
    is_spam: bool
    confidence: float
    reason: str

# Process messages
job = batch(
    messages=[
        [{"role": "user", "content": "Is this spam? You've won $1000!"}],
        [{"role": "user", "content": "Meeting at 3pm tomorrow"}],
        [{"role": "user", "content": "URGENT: Click here now!"}]
    ],
    model="claude-3-haiku-20240307",
    response_model=SpamResult
)

# Get results
results = job.results()

Response:

[
    SpamResult(is_spam=True, confidence=0.95, reason="Contains monetary prize claim"),
    SpamResult(is_spam=False, confidence=0.98, reason="Normal meeting reminder"),
    SpamResult(is_spam=True, confidence=0.92, reason="Urgent call-to-action pattern")
]

batch() with files

Process PDF files with optional structured output and citations.

from ai_batch import batch
from pydantic import BaseModel

class Invoice(BaseModel):
    company_name: str
    total_amount: str
    date: str

# Process PDFs with citations
job = batch(
    files=["invoice1.pdf", "invoice2.pdf"],
    prompt="Extract the company name, total amount, and date.",
    model="claude-3-5-sonnet-20241022",
    response_model=Invoice,
    enable_citations=True
)

results = job.results()
citations = job.citations()

Response:

# Results
[
    Invoice(company_name="TechCorp Solutions", total_amount="$12,500.00", date="March 15, 2024"),
    Invoice(company_name="DataFlow Systems", total_amount="$8,750.00", date="March 18, 2024")
]

# Citations (field-level mapping)
[
    {
        "company_name": [Citation(cited_text="TechCorp Solutions", start_page=1)],
        "total_amount": [Citation(cited_text="TOTAL: $12,500.00", start_page=2)],
        "date": [Citation(cited_text="Date: March 15, 2024", start_page=1)]
    },
    {
        "company_name": [Citation(cited_text="DataFlow Systems", start_page=1)],
        "total_amount": [Citation(cited_text="Total Due: $8,750.00", start_page=3)],
        "date": [Citation(cited_text="Invoice Date: March 18, 2024", start_page=1)]
    }
]

BatchJob

The job object returned by batch().

# Check completion status
if job.is_complete():
    results = job.results()

# Get processing statistics with cost tracking
stats = job.stats(print_stats=True)
# Output:
# 📊 Batch Statistics
#    ID: msgbatch_01BPtdnmEwxtaDcdJ2eUsq4T
#    Status: ended
#    Complete: ✅
#    Elapsed: 41.8s
#    Mode: Text + Citations
#    Results: 0
#    Citations: 0
#    Input tokens: 2,117
#    Output tokens: 81
#    Total cost: $0.0038
#    (50% batch discount applied)

# Get citations (if enabled)
citations = job.citations()

# Save raw API responses
job = batch(..., raw_results_dir="./raw_responses")

Citations

Citations work in two modes depending on whether you use structured output:

1. Text + Citations (Flat List)

When enable_citations=True without a response model, citations are returned as a flat list:

job = batch(
    files=["document.pdf"],
    prompt="Summarize the key findings",
    enable_citations=True
)

results = job.results()   # List of strings
citations = job.citations()  # Flat list of Citation objects

# Example citations:
[
    Citation(cited_text="AI reduces errors by 30%", start_page=2),
    Citation(cited_text="Implementation cost: $50,000", start_page=5)
]

2. Structured + Field Citations (Mapping)

When using both response_model and enable_citations=True, citations are mapped to specific fields:

job = batch(
    files=["document.pdf"],
    prompt="Extract the data",
    response_model=MyModel,
    enable_citations=True
)

results = job.results()   # List of Pydantic models
citations = job.citations()  # List of dicts mapping fields to citations

# Example field-level citations:
[
    {
        "title": [Citation(cited_text="Annual Report 2024", start_page=1)],
        "revenue": [Citation(cited_text="Revenue: $1.2M", start_page=3)],
        "growth": [Citation(cited_text="YoY Growth: 25%", start_page=3)]
    }
]

The field mapping allows you to trace exactly which part of the source document was used to populate each field in your structured output.

Robust Citation Parsing

AI Batch uses proper JSON parsing for citation field mapping, ensuring reliability with complex JSON structures:

Handles Complex Scenarios:

✅ Escaped quotes in JSON values: "name": "John \"The Great\" Doe"
✅ URLs with colons: "website": "http://example.com:8080"
✅ Nested objects and arrays: "metadata": {"nested": {"deep": "value"}}
✅ Multi-line strings and special characters
✅ Fields with numbers/underscores: user_name, age_2

Previous Limitations (Fixed): The old regex-based approach would fail on complex JSON patterns. The new JSON parser reliably handles any valid JSON structure that Claude produces, making citation mapping robust for production use.

Cost Tracking

AI Batch automatically tracks token usage and costs for all batch operations:

from ai_batch import batch

job = batch(
    messages=[...],
    model="claude-3-5-sonnet-20241022"
)

# Get cost information
stats = job.stats()
print(f"Total cost: ${stats['total_cost']:.4f}")
print(f"Input tokens: {stats['total_input_tokens']:,}")
print(f"Output tokens: {stats['total_output_tokens']:,}")

# Or print formatted statistics
job.stats(print_stats=True)

Example Scripts

examples/spam_detection.py - Email classification
examples/pdf_extraction.py - PDF data extraction
examples/citation_example.py - Basic citation usage
examples/citation_with_pydantic.py - Structured output with citations

Limitations

Citationm mapping only work with flat Pydantic models (no nested models)
No support for OpenAI.
PDFs require Opus/Sonnet models for best results
Batch jobs can take up to 24 hours to process
Use job.is_complete() to check status before getting results
Citations may not be available in all batch API responses

License

MIT

Todos

~~Add pricing metadata and max_spend controls~~ (Cost tracking implemented)
Auto batch manager (parallel batches, retry, spend control)
Test mode to run on 1% sample before full batch
Quick batch - split into smaller chunks for faster results
Support text/other file types (not just PDFs)
Support for OpenAI

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.12

Release history Release notifications | RSS feed

0.2.1

Jul 11, 2025

0.2.0

Jul 11, 2025

This version

0.1.0

Jul 10, 2025

0.0.3

Jul 9, 2025

0.0.2

Jul 9, 2025

0.0.1

Jul 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_batch-0.1.0.tar.gz (92.9 kB view details)

Uploaded Jul 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_batch-0.1.0-py3-none-any.whl (18.0 kB view details)

Uploaded Jul 10, 2025 Python 3

File details

Details for the file ai_batch-0.1.0.tar.gz.

File metadata

Download URL: ai_batch-0.1.0.tar.gz
Upload date: Jul 10, 2025
Size: 92.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ai_batch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`374dfc690552c615f75e7901a67a93b7f49dbf543c0da9f6055af3e01776e068`
MD5	`bcdd66398c80f8e26f5adf9573b5674d`
BLAKE2b-256	`8ad60955ec772b4a992d84ed63671c1fe5dcb1f29cbbe14846ba3a8226b7b80a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_batch-0.1.0.tar.gz:

Publisher: publish.yml on agamm/ai-batch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_batch-0.1.0.tar.gz
- Subject digest: 374dfc690552c615f75e7901a67a93b7f49dbf543c0da9f6055af3e01776e068
- Sigstore transparency entry: 269515562
- Sigstore integration time: Jul 10, 2025
Source repository:
- Permalink: agamm/ai-batch@1ad0f8bc77a44030486e178c753da3c573009357
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/agamm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1ad0f8bc77a44030486e178c753da3c573009357
- Trigger Event: release

File details

Details for the file ai_batch-0.1.0-py3-none-any.whl.

File metadata

Download URL: ai_batch-0.1.0-py3-none-any.whl
Upload date: Jul 10, 2025
Size: 18.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ai_batch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a21f44f9f3a76e8fd0c25e0a35673c77778b9e2e44663349ba5159ce5e8cb4d`
MD5	`39fba089020225d6d5027d9142635902`
BLAKE2b-256	`c7871ff4785f122d592f01fd7119e395cdbecca22867fce95615c298173c1120`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_batch-0.1.0-py3-none-any.whl:

Publisher: publish.yml on agamm/ai-batch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_batch-0.1.0-py3-none-any.whl
- Subject digest: 8a21f44f9f3a76e8fd0c25e0a35673c77778b9e2e44663349ba5159ce5e8cb4d
- Sigstore transparency entry: 269515566
- Sigstore integration time: Jul 10, 2025
Source repository:
- Permalink: agamm/ai-batch@1ad0f8bc77a44030486e178c753da3c573009357
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/agamm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1ad0f8bc77a44030486e178c753da3c573009357
- Trigger Event: release

ai-batch 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

AI Batch

API Reference

Quick Start

Installation

Usage

API Functions

batch()

batch() with files

BatchJob

Citations

1. Text + Citations (Flat List)

2. Structured + Field Citations (Mapping)

Robust Citation Parsing

Cost Tracking

Example Scripts

Limitations

License

Todos

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance