LLMHandler is a unified Python package that provides a single, consistent interface for interacting with multiple LLM providers, offering both structured (typed) and unstructured responses.
Project description
LLMHandler
Unified LLM Interface with Typed & Unstructured Responses
LLMHandler is a Python package that provides a single, consistent interface to interact with multiple large language model (LLM) providers. It supports both structured (Pydantic‑validated) and unstructured free‑form responses, along with advanced features like rate limiting, batch processing, and now per‑prompt partial failure handling.
Table of Contents
- Overview
- Features
- Installation
- Configuration
- Model Format
- Supported Providers and Their Models
- Usage Examples
- Advanced Features
- Testing
- Development & Contribution
- License
- Contact
Overview
LLMHandler unifies access to various LLM providers by letting you specify a model using a provider prefix (e.g. openai:gpt-4o-mini). The package automatically appends JSON schema instructions when a Pydantic model is provided to validate and parse responses. Alternatively, you can request unstructured free‑form text. Advanced features include rate limiting, batch processing, and partial failure handling when processing multiple prompts.
Features
-
Multi‑Provider Support:
Switch easily between providers (OpenAI, Anthropic, Gemini, DeepSeek, Ollama, etc.) using a simple model identifier. -
Structured & Unstructured Responses:
Validate outputs using Pydantic models or receive raw text. -
Batch Processing:
Process multiple prompts together with results written to JSONL files. -
Rate Limiting:
Optionally control the number of requests per minute. -
Partial-Failure Handling:
When multiple prompts are provided, each prompt is processed individually. If one prompt fails (for example, if the prompt exceeds the model’s token limit or is excessively long), its failure is captured in a dedicated result (aPromptResult) while the others succeed.
Example: If you intentionally pass a prompt that repeats"word "2,000,001 times (i.e. over two million words), it will exceed the provider’s maximum allowed input length and the error message from the API (e.g. a 400 error stating that the input is “too long”) will be returned in that prompt’s result. This lets you safely handle errors on a per‑prompt basis without aborting the entire call. -
Easy Configuration:
Automatically load API keys and settings from a.envfile.
Installation
Requirements
- Python 3.9 or later
Using PDM
pdm install
Using Pip (when available)
pip install llmhandler
Configuration
Create a .env file in your project’s root and add your API keys:
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
GEMINI_API_KEY=your_gemini_api_key
OLLAMA_API_KEY=your_ollama_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key
LLMHandler automatically loads these values at runtime.
Model Format
Every model is passed as a string in the form:
<provider>:<model_name>
- Provider Prefix: Identifies the integration class and loads the proper API key and settings.
- Model Name: Often validated via a type alias (e.g.
KnownModelName) to select the specific LLM.
Supported Providers and Their Models
| Provider | Prefix | Supported Models |
|---|---|---|
| OpenAI | openai: |
GPT‑4o Series: • openai:gpt-4o• openai:gpt-4o-2024-05-13• openai:gpt-4o-2024-08-06• openai:gpt-4o-2024-11-20• openai:gpt-4o-audio-preview• openai:gpt-4o-audio-preview-2024-10-01• openai:gpt-4o-audio-preview-2024-12-17• openai:gpt-4o-mini• openai:gpt-4o-mini-2024-07-18• openai:gpt-4o-mini-audio-preview• openai:gpt-4o-mini-audio-preview-2024-12-17o1 Series: • openai:o1• openai:o1-2024-12-17• openai:o1-mini• openai:o1-mini-2024-09-12• openai:o1-preview• openai:o1-preview-2024-09-12 |
| Anthropic | anthropic: |
• anthropic:claude-3-5-haiku-latest• anthropic:claude-3-5-sonnet-latest• anthropic:claude-3-opus-latest |
| Gemini | google-gla:(Generative Language API) google-vertex:(Vertex AI) |
• gemini-1.0-pro• gemini-1.5-flash• gemini-1.5-flash-8b• gemini-1.5-pro• gemini-2.0-flash-exp• gemini-2.0-flash-thinking-exp-01-21• gemini-exp-1206 |
| Ollama | ollama: |
Accepts any valid Ollama model. Common examples: • ollama:llama3.2• ollama:llama3.2-vision• ollama:llama3.3-70b-specdec(See ollama.com/library) |
| Deepseek | deepseek: |
• deepseek:deepseek-chat |
Note: For LLaMA-based models, Ollama (and providers like Groq, if available) are the primary options.
Usage Examples
Structured Response (Single Prompt)
import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse
async def structured_example():
handler = UnifiedLLMHandler() # API keys auto-loaded from .env
result = await handler.process(
prompts="Generate a catchy marketing slogan for a coffee brand.",
model="openai:gpt-4o-mini",
response_type=SimpleResponse
)
print("Structured Response:", result.data)
asyncio.run(structured_example())
Unstructured Response (Single Prompt)
import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
async def unstructured_example():
handler = UnifiedLLMHandler()
result = await handler.process(
prompts="Tell me a fun fact about dolphins.",
model="openai:gpt-4o-mini"
# No response_type provided: returns raw text.
)
print("Unstructured Response:", result)
asyncio.run(unstructured_example())
Multiple Prompts (Structured)
import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse
async def multiple_prompts_example():
handler = UnifiedLLMHandler()
prompts = [
"Generate a slogan for a coffee brand.",
"Create a tagline for a tea company."
]
result = await handler.process(
prompts=prompts,
model="openai:gpt-4o-mini",
response_type=SimpleResponse
)
print("Multiple Structured Responses:", result.data)
asyncio.run(multiple_prompts_example())
Batch Processing Example
import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse
async def batch_example():
# Set a rate limit to avoid overwhelming the API
handler = UnifiedLLMHandler(requests_per_minute=60)
prompts = [
"Generate a slogan for a coffee brand.",
"Create a tagline for a tea company.",
"Write a catchphrase for a juice brand."
]
# Use batch_mode=True to process multiple prompts together (structured responses only)
batch_result = await handler.process(
prompts=prompts,
model="openai:gpt-4o-mini",
response_type=SimpleResponse,
batch_mode=True
)
print("Batch Processing Result:", batch_result.data)
asyncio.run(batch_example())
Partial Failure Example
When processing multiple prompts, LLMHandler processes each prompt independently. If one prompt fails (for example, if the prompt is extremely long), its error is captured and returned along with the successful responses.
Below is an example that demonstrates this behavior. In this case, we deliberately send a “bad” prompt that repeats the word "word " 2,000,001 times (approximately 2 million words) so that it exceeds the model’s token limit. The resulting output will include an error for that prompt while still returning responses for the other prompts.
import asyncio
from llmhandler.api_handler import UnifiedLLMHandler
from llmhandler._internal_models import SimpleResponse
async def partial_failure_example():
handler = UnifiedLLMHandler()
# Two good prompts and one extremely long (bad) prompt.
good_prompt = "Tell me a fun fact about penguins."
# Construct a bad prompt that far exceeds any realistic token limit.
# Here we repeat "word " 2,000,001 times (approximately 2 million words),
# which should trigger a token limit error.
bad_prompt = "word " * 2000001
another_good = "What are the benefits of regular exercise?"
partial_prompts = [good_prompt, bad_prompt, another_good]
result = await handler.process(
prompts=partial_prompts,
model="openai:gpt-4o-mini",
response_type=SimpleResponse
)
print("Partial Failure Real API Result:")
# The returned object is a UnifiedResponse whose data is a list of PromptResult objects.
results_list = result.data if isinstance(result, UnifiedResponse) else result
for pr in results_list:
display_prompt = pr.prompt if len(pr.prompt) < 60 else pr.prompt[:60] + "..."
print(f"Prompt: {display_prompt}")
if pr.error:
print(f" ERROR: {pr.error}")
else:
print(f" Response: {pr.data}")
print("-" * 40)
asyncio.run(partial_failure_example())
Advanced Features
-
Batch Processing & Rate Limiting:
Initialize the handler withrequests_per_minuteto throttle calls. When processing a list of prompts, setbatch_mode=Trueto handle them as a batch (supported only for structured responses). -
Structured vs. Unstructured Responses:
- Supply a Pydantic model as
response_typefor validated, structured output. - Omit or set
response_type=Noneto receive raw, unstructured text.
- Supply a Pydantic model as
-
Partial Failure Handling:
When multiple prompts are submitted, each prompt is processed independently. If one prompt fails (for example, if you submit a prompt that far exceeds the maximum token limit—as with a prompt containing over 2 million words), the error is captured in its corresponding result. You will receive a list of results where each item (aPromptResult) contains the original prompt along with either a valid response or an error message. This lets you handle failures on a per‑prompt basis without aborting the entire request. -
Troubleshooting:
Error messages (such as schema validation failures, token limit errors, or overloaded service errors) are clearly reported in theerrorfield of the UnifiedResponse or PromptResult. Make sure your model strings follow the<provider>:<model_name>format exactly.
Testing
A comprehensive test suite is included. To run tests, simply execute:
pytest
Development & Contribution
Contributions are welcome! To set up your development environment:
-
Clone the Repository:
git clone https://github.com/yourusername/LLMHandler.git cd LLMHandler
-
Install Dependencies:
pdm install -
Run Tests:
pytest
-
Submit a Pull Request with your improvements or bug fixes.
License
This project is licensed under the MIT License.
Contact
For questions, feedback, or contributions, please reach out to:
Bryan Nsoh
Email: bryan.anye.5@gmail.com
Happy coding with LLMHandler!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_handler_validator-0.1.1.tar.gz.
File metadata
- Download URL: llm_handler_validator-0.1.1.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.13.1 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2663e3c2d61546acf67e48d8bea4e308700b8dfbd47edc3493c907140a6d568
|
|
| MD5 |
611077c358a4475d5d903750782e7151
|
|
| BLAKE2b-256 |
8f0d20639d9cba09b986ca616bd510e75236762cc3de73d94ea230d188e7358f
|
File details
Details for the file llm_handler_validator-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llm_handler_validator-0.1.1-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.22.3 CPython/3.13.1 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74f7370ff8ee7c130318b6b2a25ac9d0144b23951b4f988ad4a3e4c661f5ef64
|
|
| MD5 |
9df028b570d863b587d5bc8c44c6fed0
|
|
| BLAKE2b-256 |
0245924765fb3b80dab007b8a9daacfc943fa030c3ab4db239d000e662b52610
|