Skip to main content

LLMHandler is a unified Python package that provides a single, consistent interface for interacting with multiple LLM providers, offering both structured (typed) and unstructured responses.

Project description

LLMHandler

Unified LLM Interface with Typed & Unstructured Responses

LLMHandler is a Python package that provides a single, consistent interface to interact with multiple large language model (LLM) providers. It supports both structured (Pydantic‑validated) and unstructured free‑form responses, along with advanced features like rate limiting, batch processing, and now per‑prompt partial failure handling.


Table of Contents


Overview

LLMHandler unifies access to various LLM providers by letting you specify a model using a provider prefix (e.g. openai:gpt-4o-mini). The package automatically appends JSON schema instructions when a Pydantic model is provided to validate and parse responses. Alternatively, you can request unstructured free‑form text. Advanced features include rate limiting, batch processing, and partial failure handling when processing multiple prompts.


Features

  • Multi‑Provider Support:
    Switch easily between providers (OpenAI, Anthropic, Gemini, DeepSeek, Ollama, etc.) using a simple model identifier.

  • Structured & Unstructured Responses:
    Validate outputs using Pydantic models or receive raw text.

  • Batch Processing:
    Process multiple prompts together with results written to JSONL files.

  • Rate Limiting:
    Optionally control the number of requests per minute.

  • Partial-Failure Handling:
    When multiple prompts are provided, each prompt is processed individually. If one prompt fails (for example, if the prompt exceeds the model’s token limit or is excessively long), its failure is captured in a dedicated result (a PromptResult) while the others succeed.
    Example: If you intentionally pass a prompt that repeats "word " 2,000,001 times (i.e. over two million words), it will exceed the provider’s maximum allowed input length and the error message from the API (e.g. a 400 error stating that the input is “too long”) will be returned in that prompt’s result. This lets you safely handle errors on a per‑prompt basis without aborting the entire call.

  • Easy Configuration:
    Automatically load API keys and settings from a .env file.


Installation

Requirements

  • Python 3.9 or later

Using PDM

pdm install

Using Pip (when available)

pip install llmhandler

Configuration

Create a .env file in your project’s root and add your API keys:

OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
GEMINI_API_KEY=your_gemini_api_key
OLLAMA_API_KEY=your_ollama_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key

LLMHandler automatically loads these values at runtime.


Model Format

Every model is passed as a string in the form:

<provider>:<model_name>
  • Provider Prefix: Identifies the integration class and loads the proper API key and settings.
  • Model Name: Often validated via a type alias (e.g. KnownModelName) to select the specific LLM.

Supported Providers and Their Models

Provider Prefix Supported Models
OpenAI openai: GPT‑4o Series:
openai:gpt-4o
openai:gpt-4o-2024-05-13
openai:gpt-4o-2024-08-06
openai:gpt-4o-2024-11-20
openai:gpt-4o-audio-preview
openai:gpt-4o-audio-preview-2024-10-01
openai:gpt-4o-audio-preview-2024-12-17
openai:gpt-4o-mini
openai:gpt-4o-mini-2024-07-18
openai:gpt-4o-mini-audio-preview
openai:gpt-4o-mini-audio-preview-2024-12-17

o1 Series:
openai:o1
openai:o1-2024-12-17
openai:o1-mini
openai:o1-mini-2024-09-12
openai:o1-preview
openai:o1-preview-2024-09-12
Anthropic anthropic: anthropic:claude-3-5-haiku-latest
anthropic:claude-3-5-sonnet-latest
anthropic:claude-3-opus-latest
Gemini google-gla:
(Generative Language API)
google-vertex:
(Vertex AI)
gemini-1.0-pro
gemini-1.5-flash
gemini-1.5-flash-8b
gemini-1.5-pro
gemini-2.0-flash-exp
gemini-2.0-flash-thinking-exp-01-21
gemini-exp-1206
Ollama ollama: Accepts any valid Ollama model. Common examples:
ollama:llama3.2
ollama:llama3.2-vision
ollama:llama3.3-70b-specdec
(See ollama.com/library)
Deepseek deepseek: deepseek:deepseek-chat

Note: For LLaMA-based models, Ollama (and providers like Groq, if available) are the primary options.


Usage Examples

Structured Response (Single Prompt)

import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse

async def structured_example():
    handler = UnifiedLLMHandler()  # API keys auto-loaded from .env
    result = await handler.process(
        prompts="Generate a catchy marketing slogan for a coffee brand.",
        model="openai:gpt-4o-mini",
        response_type=SimpleResponse
    )
    print("Structured Response:", result.data)

asyncio.run(structured_example())

Unstructured Response (Single Prompt)

import asyncio
from llmhandler.api_handler import UnifiedLLMHandler

async def unstructured_example():
    handler = UnifiedLLMHandler()
    result = await handler.process(
        prompts="Tell me a fun fact about dolphins.",
        model="openai:gpt-4o-mini"
        # No response_type provided: returns raw text.
    )
    print("Unstructured Response:", result)

asyncio.run(unstructured_example())

Multiple Prompts (Structured)

import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse

async def multiple_prompts_example():
    handler = UnifiedLLMHandler()
    prompts = [
        "Generate a slogan for a coffee brand.",
        "Create a tagline for a tea company."
    ]
    result = await handler.process(
        prompts=prompts,
        model="openai:gpt-4o-mini",
        response_type=SimpleResponse
    )
    print("Multiple Structured Responses:", result.data)

asyncio.run(multiple_prompts_example())

Batch Processing Example

import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse

async def batch_example():
    # Set a rate limit to avoid overwhelming the API
    handler = UnifiedLLMHandler(requests_per_minute=60)
    prompts = [
        "Generate a slogan for a coffee brand.",
        "Create a tagline for a tea company.",
        "Write a catchphrase for a juice brand."
    ]
    # Use batch_mode=True to process multiple prompts together (structured responses only)
    batch_result = await handler.process(
        prompts=prompts,
        model="openai:gpt-4o-mini",
        response_type=SimpleResponse,
        batch_mode=True
    )
    print("Batch Processing Result:", batch_result.data)

asyncio.run(batch_example())

Partial Failure Example

When processing multiple prompts, LLMHandler processes each prompt independently. If one prompt fails (for example, if the prompt is extremely long), its error is captured and returned along with the successful responses.

Below is an example that demonstrates this behavior. In this case, we deliberately send a “bad” prompt that repeats the word "word " 2,000,001 times (approximately 2 million words) so that it exceeds the model’s token limit. The resulting output will include an error for that prompt while still returning responses for the other prompts.

import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse

async def partial_failure_example():
    handler = UnifiedLLMHandler()
    # Two good prompts and one extremely long (bad) prompt.
    good_prompt = "Tell me a fun fact about penguins."
    # Construct a bad prompt that far exceeds any realistic token limit.
    # Here we repeat "word " 2,000,001 times (approximately 2 million words),
    # which should trigger a token limit error.
    bad_prompt = "word " * 2000001
    another_good = "What are the benefits of regular exercise?"
    partial_prompts = [good_prompt, bad_prompt, another_good]

    result = await handler.process(
        prompts=partial_prompts,
        model="openai:gpt-4o-mini",
        response_type=SimpleResponse
    )
    print("Partial Failure Real API Result:")
    # The returned object is a UnifiedResponse whose data is a list of PromptResult objects.
    results_list = result.data if isinstance(result, UnifiedResponse) else result
    for pr in results_list:
        display_prompt = pr.prompt if len(pr.prompt) < 60 else pr.prompt[:60] + "..."
        print(f"Prompt: {display_prompt}")
        if pr.error:
            print(f"  ERROR: {pr.error}")
        else:
            print(f"  Response: {pr.data}")
        print("-" * 40)

asyncio.run(partial_failure_example())

Advanced Features

  • Batch Processing & Rate Limiting:
    Initialize the handler with requests_per_minute to throttle calls. When processing a list of prompts, set batch_mode=True to handle them as a batch (supported only for structured responses).

  • Structured vs. Unstructured Responses:

    • Supply a Pydantic model as response_type for validated, structured output.
    • Omit or set response_type=None to receive raw, unstructured text.
  • Partial Failure Handling:
    When multiple prompts are submitted, each prompt is processed independently. If one prompt fails (for example, if you submit a prompt that far exceeds the maximum token limit—as with a prompt containing over 2 million words), the error is captured in its corresponding result. You will receive a list of results where each item (a PromptResult) contains the original prompt along with either a valid response or an error message. This lets you handle failures on a per‑prompt basis without aborting the entire request.

  • Troubleshooting:
    Error messages (such as schema validation failures, token limit errors, or overloaded service errors) are clearly reported in the error field of the UnifiedResponse or PromptResult. Make sure your model strings follow the <provider>:<model_name> format exactly.


Testing

A comprehensive test suite is included. To run tests, simply execute:

pytest

Development & Contribution

Contributions are welcome! To set up your development environment:

  1. Clone the Repository:

    git clone https://github.com/yourusername/LLMHandler.git
    cd LLMHandler
    
  2. Install Dependencies:

    pdm install
    
  3. Run Tests:

    pytest
    
  4. Submit a Pull Request with your improvements or bug fixes.


License

This project is licensed under the MIT License.


Contact

For questions, feedback, or contributions, please reach out to:

Bryan Nsoh
Email: bryan.anye.5@gmail.com


Happy coding with LLMHandler!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_handler_validator-0.1.1.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_handler_validator-0.1.1-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file llm_handler_validator-0.1.1.tar.gz.

File metadata

  • Download URL: llm_handler_validator-0.1.1.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.13.1 Windows/11

File hashes

Hashes for llm_handler_validator-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c2663e3c2d61546acf67e48d8bea4e308700b8dfbd47edc3493c907140a6d568
MD5 611077c358a4475d5d903750782e7151
BLAKE2b-256 8f0d20639d9cba09b986ca616bd510e75236762cc3de73d94ea230d188e7358f

See more details on using hashes here.

File details

Details for the file llm_handler_validator-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_handler_validator-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74f7370ff8ee7c130318b6b2a25ac9d0144b23951b4f988ad4a3e4c661f5ef64
MD5 9df028b570d863b587d5bc8c44c6fed0
BLAKE2b-256 0245924765fb3b80dab007b8a9daacfc943fa030c3ab4db239d000e662b52610

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page