Skip to main content

A Python package that enables batch submission of prompts to LLM APIs, with simplified interface and built-in async capabilities handled implicitly.

Project description

LLM Batch Helper

PyPI version Downloads Downloads/Month Documentation Status License: MIT

A Python package that enables batch submission of prompts to LLM APIs, with built-in async capabilities, response caching, prompt verification, and more. This package is designed to streamline applications like LLM simulation, LLM-as-a-judge, and other batch processing scenarios.

๐Ÿ“– Complete Documentation | ๐Ÿš€ Quick Start Guide

Why we designed this package

Imagine you have 5000 prompts you need to send to an LLM. Running them sequentially can be painfully slowโ€”sometimes taking hours or even days. Worse, if the process fails midway, youโ€™re forced to start all over again. Weโ€™ve struggled with this exact frustration, which is why we built this package, to directly tackle these pain points:

  1. Efficient Batch Processing: How do you run LLM calls in batches efficiently? Our async implementation is 3X-100X faster than multi-thread/multi-process approaches. In my own experience, it reduces the time from 24 hours to 10min.

  2. API Reliability: LLM APIs can be unstable, so we need robust retry mechanisms when calls get interrupted.

  3. Long-Running Simulations: During long-running LLM simulations, computers can crash and APIs can fail. Can we cache LLM API calls to avoid repeating completed work?

  4. Output Validation: LLM outputs often have format requirements. If the output isn't right, we need to retry with validation.

This package is designed to solve these exact pain points with async processing, intelligent caching, and comprehensive error handling. If there are some additional features you need, please post an issue.

Features

  • ๐Ÿš€ Dramatic Speed Improvements: 10-100x faster than sequential processing (see demo)
  • โšก Async Processing: Submit multiple prompts concurrently for maximum throughput
  • ๐Ÿ’พ Smart Caching: Automatically cache responses and resume interrupted work seamlessly
  • ๐Ÿ“ Multiple Input Formats: Support for strings, tuples, dictionaries, and file-based prompts
  • ๐ŸŒ Multi-Provider Support: Works with OpenAI (all models), OpenRouter (100+ models), Together.ai, and Google Gemini
  • ๐Ÿ”„ Intelligent Retry Logic: Built-in retry mechanism with exponential backoff and detailed logging
  • โœ… Quality Control: Custom verification callbacks for response validation
  • ๐Ÿ“Š Progress Tracking: Real-time progress bars and comprehensive statistics
  • ๐ŸŽฏ Simplified API: No async/await complexity - works seamlessly in Jupyter notebooks (v0.3.0+)
  • ๐Ÿ”ง Tunable Performance: Adjust concurrency on-the-fly for optimal speed vs rate limits

Installation

# Install from PyPI
pip install llm_batch_helper

Quick Start

1. Set up environment variables

Option A: Environment Variables

# For OpenAI (all OpenAI models including GPT-5)
export OPENAI_API_KEY="your-openai-api-key"

# For OpenRouter (100+ models - Recommended)
export OPENROUTER_API_KEY="your-openrouter-api-key"

# For Together.ai
export TOGETHER_API_KEY="your-together-api-key"

# For Google Gemini
export GEMINI_API_KEY="your-gemini-api-key"
# OR alternatively:
export GOOGLE_API_KEY="your-gemini-api-key"

Option B: .env File (Recommended for Development) Create a .env file in your project:

OPENAI_API_KEY=your-openai-api-key
# In your script, before importing llm_batch_helper
from dotenv import load_dotenv
load_dotenv()  # Load from .env file

# Then use the package normally
from llm_batch_helper import LLMConfig, process_prompts_batch

2. Interactive Tutorials (Recommended)

๐ŸŽฏ NEW: Performance Comparison Tutorial See the dramatic speed improvements! Our Performance Comparison Tutorial demonstrates:

  • 10-100x speedup vs naive sequential processing
  • Processing 5,000 prompts in minutes instead of hours
  • Smart caching that lets you resume interrupted work
  • Tunable concurrency for optimal performance

๐Ÿ“š Complete Feature Tutorial Check out the comprehensive main tutorial covering all features with interactive examples!

3. Basic usage

from dotenv import load_dotenv  # Optional: for .env file support
from llm_batch_helper import LLMConfig, process_prompts_batch

# Optional: Load environment variables from .env file
load_dotenv()

# Create configuration
config = LLMConfig(
    model_name="gpt-4o-mini",
    temperature=1.0,
    max_completion_tokens=100,
    max_concurrent_requests=100  # number of concurrent requests with asyncIO, this number decides how fast your pipeline can run. We suggest a number that is as large as possible (e.g., 300) while making sure you are not over the rate limit constrained by the LLM APIs. 
)

# Process prompts
prompts = [
    "What is the capital of France?",
    "What is 2+2?",
    "Who wrote 'Hamlet'?"
]

results = process_prompts_batch(
    config=config,
    provider="openai",
    prompts=prompts,
    cache_dir="cache"
)

# Print results
for prompt_id, response in results.items():
    print(f"{prompt_id}: {response['response_text']}")

๐ŸŽ‰ New in v0.3.0: process_prompts_batch now handles async operations implicitly - no more async/await syntax needed! Works seamlessly in Jupyter notebooks.

4. Multiple Input Formats

The package supports three different input formats for maximum flexibility:

from llm_batch_helper import LLMConfig, process_prompts_batch

config = LLMConfig(
    model_name="gpt-4o-mini",
    temperature=1.0,
    max_completion_tokens=100
)

# Mix different input formats in the same list
prompts = [
    # String format - ID will be auto-generated from hash
    "What is the capital of France?",
    
    # Tuple format - (custom_id, prompt_text)
    ("custom_id_1", "What is 2+2?"),
    
    # Dictionary format - {"id": custom_id, "text": prompt_text}
    {"id": "shakespeare_q", "text": "Who wrote 'Hamlet'?"},
    {"id": "science_q", "text": "Explain photosynthesis briefly."}
]

results = process_prompts_batch(
    config=config,
    provider="openai",
    prompts=prompts,
    cache_dir="cache"
)

# Print results with custom IDs
for prompt_id, response in results.items():
    print(f"{prompt_id}: {response['response_text']}")

Input Format Requirements:

  • String: Plain text prompt (ID auto-generated)
  • Tuple: (prompt_id, prompt_text) - both elements required
  • Dictionary: {"id": "prompt_id", "text": "prompt_text"} - both keys required

๐Ÿ”„ Backward Compatibility

For users who prefer the async version or have existing code, the async API is still available:

import asyncio
from llm_batch_helper import process_prompts_batch_async

async def main():
    results = await process_prompts_batch_async(
        prompts=["Hello world!"],
        config=config,
        provider="openai"
    )
    return results

results = asyncio.run(main())

Usage Examples

OpenRouter (Recommended - 100+ Models)

from llm_batch_helper import LLMConfig, process_prompts_batch

# Access 100+ models through OpenRouter
config = LLMConfig(
    model_name="deepseek/deepseek-v3.1-base",  # or openai/gpt-4o, anthropic/claude-3-5-sonnet
    temperature=1.0,
    max_completion_tokens=500
)

prompts = [
    "Explain quantum computing briefly.",
    "What are the benefits of renewable energy?",
    "How does machine learning work?"
]

results = process_prompts_batch(
    prompts=prompts,
    config=config,
    provider="openrouter"  # Access to 100+ models!
)

for prompt_id, result in results.items():
    print(f"Response: {result['response_text']}")

Google Gemini Provider

from llm_batch_helper import LLMConfig, process_prompts_batch

config = LLMConfig(
    model_name="gemini-1.5-pro",  # or "gemini-1.5-flash"
    temperature=1.0,
    max_completion_tokens=200
)

prompts = [
    "Explain the theory of relativity.",
    "What are the main causes of climate change?",
    "How does photosynthesis work?"
]

results = process_prompts_batch(
    prompts=prompts,
    config=config,
    provider="gemini"  # Use Google Gemini!
)

for prompt_id, result in results.items():
    print(f"Response: {result['response_text']}")

File-based Prompts

from llm_batch_helper import LLMConfig, process_prompts_batch

config = LLMConfig(
    model_name="gpt-4o-mini",
    temperature=1.0,
    max_completion_tokens=200
)

# Process all .txt files in a directory
results = process_prompts_batch(
    config=config,
    provider="openai",
    input_dir="prompts",  # Directory containing .txt files
    cache_dir="cache",
    force=False  # Use cached responses if available
)

print(f"Processed {len(results)} prompts from files")

Custom Verification

from llm_batch_helper import LLMConfig

def verify_response(prompt_id, llm_response_data, original_prompt_text, **kwargs):
    """Custom verification callback"""
    response_text = llm_response_data.get("response_text", "")
    
    # Check minimum length
    if len(response_text) < kwargs.get("min_length", 10):
        return False
    
    # Check for specific keywords
    if "error" in response_text.lower():
        return False
    
    return True

config = LLMConfig(
    model_name="gpt-4o-mini",
    temperature=1.0,
    verification_callback=verify_response,
    verification_callback_args={"min_length": 20}
)

API Reference

LLMConfig

Configuration class for LLM requests.

LLMConfig(
    model_name: str,
    temperature: float = 1.0,
    max_completion_tokens: Optional[int] = None,  # Preferred parameter
    max_tokens: Optional[int] = None,  # Deprecated, kept for backward compatibility
    system_instruction: Optional[str] = None,
    max_retries: int = 5,
    max_concurrent_requests: int = 30,
    verification_callback: Optional[Callable] = None,
    verification_callback_args: Optional[Dict] = None
)

process_prompts_batch

Main function for batch processing of prompts (async operations handled implicitly).

def process_prompts_batch(
    config: LLMConfig,
    provider: str,  # "openai", "openrouter" (recommended), or "together"
    prompts: Optional[List[str]] = None,
    input_dir: Optional[str] = None,
    cache_dir: str = "llm_cache",
    force: bool = False,
    desc: str = "Processing prompts"
) -> Dict[str, Dict[str, Any]]

process_prompts_batch_async

Async version for backward compatibility and advanced use cases.

async def process_prompts_batch_async(
    config: LLMConfig,
    provider: str,  # "openai", "openrouter" (recommended), or "together"
    prompts: Optional[List[str]] = None,
    input_dir: Optional[str] = None,
    cache_dir: str = "llm_cache",
    force: bool = False,
    desc: str = "Processing prompts"
) -> Dict[str, Dict[str, Any]]

LLMCache

Caching functionality for responses.

cache = LLMCache(cache_dir="my_cache")

# Check for cached response
cached = cache.get_cached_response(prompt_id)

# Save response to cache
cache.save_response(prompt_id, prompt_text, response_data)

# Clear all cached responses
cache.clear_cache()

Project Structure

llm_batch_helper/
โ”œโ”€โ”€ pyproject.toml              # Poetry configuration
โ”œโ”€โ”€ poetry.lock                 # Locked dependencies
โ”œโ”€โ”€ README.md                   # This file
โ”œโ”€โ”€ LICENSE                     # License file
โ”œโ”€โ”€ llm_batch_helper/          # Main package
โ”‚   โ”œโ”€โ”€ __init__.py            # Package exports
โ”‚   โ”œโ”€โ”€ cache.py               # Response caching
โ”‚   โ”œโ”€โ”€ config.py              # Configuration classes
โ”‚   โ”œโ”€โ”€ providers.py           # LLM provider implementations
โ”‚   โ”œโ”€โ”€ input_handlers.py      # Input processing utilities
โ”‚   โ””โ”€โ”€ exceptions.py          # Custom exceptions
โ”œโ”€โ”€ examples/                   # Usage examples
โ”‚   โ”œโ”€โ”€ example.py             # Basic usage example
โ”‚   โ”œโ”€โ”€ prompts/               # Sample prompt files
โ”‚   โ””โ”€โ”€ llm_cache/             # Example cache directory
โ””โ”€โ”€ tutorials/                 # Interactive tutorials
    โ”œโ”€โ”€ llm_batch_helper_tutorial.ipynb  # Comprehensive feature tutorial
    โ””โ”€โ”€ performance_comparison_tutorial.ipynb  # Performance demo (NEW!)

Supported Models

OpenAI

  • All OpenAI models

OpenRouter (Recommended - 100+ Models)

  • OpenAI models: openai/gpt-4o, openai/gpt-4o-mini
  • Anthropic models: anthropic/claude-3-5-sonnet, anthropic/claude-3-haiku
  • DeepSeek models: deepseek/deepseek-v3.1-base, deepseek/deepseek-chat
  • Meta models: meta-llama/llama-3.1-405b-instruct
  • Google models: google/gemini-pro-1.5
  • And 90+ more models from all major providers

Together.ai

  • meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
  • meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
  • mistralai/Mixtral-8x7B-Instruct-v0.1
  • And many other open-source models

Google Gemini (Direct API)

  • gemini-1.5-pro: Most capable model for complex reasoning tasks
  • gemini-1.5-flash: Fast and cost-effective for most use cases
  • gemini-1.0-pro: Previous generation model

Note: Gemini models support multimodal inputs (text, images, audio) through the Google AI Studio API.

Documentation

๐Ÿ“– Complete Documentation - Comprehensive docs on Read the Docs

Quick Links:

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Run the test suite
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

v0.3.3

  • ๐Ÿ› Bug Fix: Fixed caching issue that required verification_callback to be non-None
  • ๐Ÿ“ฆ Package Maintenance: Version sync and build improvements
  • Fixed version consistency across package files
  • Updated build process for improved reliability

v0.3.2

  • ๐Ÿ“š Documentation Updates: Enhanced README with performance focus
  • Added new performance comparison tutorial showcasing 10-100x speedups
  • Improved examples with simplified API usage (no async/await)
  • Updated installation and quick start guides
  • Enhanced content organization and clarity

v0.3.1

  • ๐Ÿ”ง Configuration Updates: Optimized default values for better performance
  • Updated max_retries from 10 to 5 for faster failure detection
  • Updated max_concurrent_requests from 5 to 30 for improved batch processing performance

v0.3.0

  • ๐ŸŽ‰ Major Update: Simplified API - async operations handled implicitly, no async/await required!
  • ๐Ÿ““ Jupyter Support: Works seamlessly in notebooks without event loop issues
  • ๐Ÿ” Detailed Retry Logging: See exactly what happens during retries with timestamps
  • ๐Ÿ”„ Backward Compatibility: Original async API still available as process_prompts_batch_async
  • ๐Ÿ“š Updated Examples: All documentation updated to show simplified usage
  • โšก Smart Event Loop Handling: Automatically detects and handles different Python environments

v0.2.0

  • Enhanced API stability
  • Improved error handling
  • Better documentation

v0.1.5

  • Added Together.ai provider support
  • Support for open-source models (Llama, Mixtral, etc.)
  • Enhanced documentation with Read the Docs
  • Updated examples and tutorials

v0.1.0

  • Initial release
  • Support for OpenAI API
  • Async batch processing
  • Response caching
  • File and list-based input support
  • Custom verification callbacks
  • Poetry package management

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_batch_helper-0.4.0.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_batch_helper-0.4.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file llm_batch_helper-0.4.0.tar.gz.

File metadata

  • Download URL: llm_batch_helper-0.4.0.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.7 Darwin/22.6.0

File hashes

Hashes for llm_batch_helper-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d77ed2b4c4c617cff4292b28fde64d6a05937d0881e0bb6967eab31cc1682290
MD5 6fe523636988186132ea74ea0fdf0f81
BLAKE2b-256 2adc8487c3427143a7210a6db5bc02ebe0c27c7243fbd37c9e11f51eaac35338

See more details on using hashes here.

File details

Details for the file llm_batch_helper-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: llm_batch_helper-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.11.7 Darwin/22.6.0

File hashes

Hashes for llm_batch_helper-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2e99b1a0582c34bbb0c76cf5a589d8bb445e625c0161446b4714d98f715d38d
MD5 38e5ca5563847e2cb7f06ce65fb7f0c6
BLAKE2b-256 021823ac0ebf65c5aded275173b7f5e90acb01b7d5a940da0324e9173abf7583

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page