A unified async Python wrapper for multiple LLM providers with OpenAI Response API and reasoning support
Project description
SmartLLM
A unified async Python wrapper for multiple LLM providers with a consistent interface.
Features
- Unified Interface - Single API for multiple LLM providers (OpenAI, AWS Bedrock)
- Async/Await - Built on asyncio for high-performance concurrent requests
- Smart Caching - Two-level cache (local + DynamoDB) to reduce costs and latency
- Auto Retry - Exponential backoff retry logic for transient failures
- Structured Output - Native Pydantic model support for type-safe responses
- Streaming - Real-time streaming responses for better UX
- Rate Limiting - Built-in concurrency control per model
- Decorator Logging - Automatic function logging via Logorator
- Progress Callbacks - Optional
on_progresscallback for real-time LLM events - OpenAI Response API - Full support for OpenAI's primary API including reasoning models
Installation
pip install smartllm
Optional Dependencies
Install only the providers you need:
# For OpenAI
pip install smartllm[openai]
# For AWS Bedrock
pip install smartllm[bedrock]
# For all providers
pip install smartllm[all]
DynamoDB Caching (optional)
To enable shared two-level caching across machines:
async with LLMClient(provider="openai", dynamo_table_name="my-llm-cache") as client:
...
Requires AWS credentials with DynamoDB access. The table is auto-created if it doesn't exist. Local file cache is always used as the first layer.
Quick Start
Basic Usage
import asyncio
from smartllm import LLMClient, TextRequest
async def main():
# Auto-detects provider from environment variables
async with LLMClient(provider="openai") as client:
response = await client.generate_text(
TextRequest(prompt="What is the capital of France?")
)
print(response.text)
asyncio.run(main())
Multi-turn Conversations
from smartllm import LLMClient, MessageRequest, Message
async with LLMClient(provider="openai") as client:
messages = [
Message(role="user", content="My name is Alice."),
Message(role="assistant", content="Nice to meet you, Alice!"),
Message(role="user", content="What's my name?"),
]
response = await client.send_message(
MessageRequest(messages=messages)
)
print(response.text) # "Your name is Alice."
Streaming Responses
from smartllm import LLMClient, TextRequest
async with LLMClient(provider="openai") as client:
request = TextRequest(
prompt="Write a short poem about Python.",
stream=True
)
async for chunk in client.generate_text_stream(request):
print(chunk.text, end="", flush=True)
Structured Output with Pydantic
from pydantic import BaseModel
from smartllm import LLMClient, TextRequest
class Person(BaseModel):
name: str
age: int
occupation: str
async with LLMClient(provider="openai") as client:
response = await client.generate_text(
TextRequest(
prompt="Generate a person profile for a software engineer named John, age 30.",
response_format=Person
)
)
person = response.structured_data
print(f"{person.name} is a {person.age} year old {person.occupation}")
Configuration
Environment Variables
OpenAI:
export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o-mini" # Optional
AWS Bedrock:
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"
export BEDROCK_MODEL="anthropic.claude-3-sonnet-20240229-v1:0" # Optional
Programmatic Configuration
from smartllm import LLMClient, LLMConfig
config = LLMConfig(
provider="openai",
api_key="your-api-key",
default_model="gpt-4o",
temperature=0.7,
max_tokens=2048,
max_retries=3,
)
async with LLMClient(config) as client:
# Use client...
pass
Customizing Defaults
from smartllm import defaults
# Modify global defaults
defaults.DEFAULT_TEMPERATURE = 0.7
defaults.DEFAULT_MAX_TOKENS = 4096
defaults.DEFAULT_MAX_RETRIES = 5
OpenAI API Types
SmartLLM supports both OpenAI APIs via the api_type parameter:
"responses"(default) - OpenAI's primary Response API, recommended for all modern models"chat_completions"- Legacy Chat Completions API, supported indefinitely
# Response API (default)
response = await client.generate_text(
TextRequest(prompt="Hello", api_type="responses")
)
# Chat Completions API (legacy)
response = await client.generate_text(
TextRequest(prompt="Hello", api_type="chat_completions")
)
Reasoning Models
For models that support reasoning (e.g. GPT-5.x), use reasoning_effort to control how much the model reasons before responding. Reasoning tokens are returned in response.metadata:
response = await client.generate_text(
TextRequest(
prompt="Solve: what is the 100th Fibonacci number?",
reasoning_effort="high", # "low", "medium", or "high"
)
)
print(response.text)
print(f"Reasoning tokens used: {response.metadata.get('reasoning_tokens', 0)}")
Note: reasoning models do not support temperature. Passing a value other than 1 will raise a ValueError.
Reasoning with Structured Output
from pydantic import BaseModel
from smartllm import LLMClient, TextRequest
class Solution(BaseModel):
answer: float
unit: str
explanation: str
async with LLMClient(provider="openai") as client:
response = await client.generate_text(
TextRequest(
prompt="A train leaves city A at 60mph toward city B (300 miles away). Another leaves B at 90mph. When do they meet?",
response_format=Solution,
reasoning_effort="medium",
)
)
solution = response.structured_data
print(f"{solution.answer} {solution.unit}: {solution.explanation}")
print(f"Reasoning tokens: {response.metadata.get('reasoning_tokens', 0)}")
Advanced Features
Caching
Responses are automatically cached when temperature=0:
# First call - hits API
response1 = await client.generate_text(
TextRequest(prompt="What is 2+2?", temperature=0)
)
# Second call - uses cache (instant, free)
response2 = await client.generate_text(
TextRequest(prompt="What is 2+2?", temperature=0)
)
# Clear cache for specific request
response3 = await client.generate_text(
TextRequest(prompt="What is 2+2?", temperature=0, clear_cache=True)
)
Concurrent Requests
import asyncio
from smartllm import LLMClient, TextRequest
async with LLMClient(provider="openai") as client:
prompts = ["Question 1", "Question 2", "Question 3"]
tasks = [
client.generate_text(TextRequest(prompt=p))
for p in prompts
]
responses = await asyncio.gather(*tasks)
Rate Limiting
# Limit concurrent requests
client = LLMClient(provider="openai", max_concurrent=5)
Progress Callbacks
Pass an on_progress callable to TextRequest or MessageRequest to receive real-time events. Both sync and async callables are supported.
async def on_progress(event):
print(event)
async with LLMClient(provider="openai") as client:
response = await client.generate_text(
TextRequest(prompt="What is the capital of France?", on_progress=on_progress)
)
Each event is a dict with event, ts (Unix timestamp), prompt, model, and provider fields:
| event | additional fields | notes |
|---|---|---|
llm_started |
— | fired before API call / cache check |
llm_done |
— | fired after a live API call completes |
cache_hit |
cache_source |
fired when response is served from cache; cache_source is "l1" (local) or "l2" (DynamoDB) |
error |
message |
fired on exception |
Provider-Specific Clients
For advanced use cases, access provider-specific clients:
from smartllm.openai import OpenAILLMClient, OpenAIConfig
from smartllm.bedrock import BedrockLLMClient, BedrockConfig
# OpenAI-specific features
openai_config = OpenAIConfig(api_key="...", organization="...")
async with OpenAILLMClient(openai_config) as client:
models = await client.list_available_models()
# Bedrock-specific features
bedrock_config = BedrockConfig(aws_region="us-east-1")
async with BedrockLLMClient(bedrock_config) as client:
models = await client.list_available_model_ids()
Supported Providers
- OpenAI - GPT models via OpenAI API
- AWS Bedrock - Claude, Llama, Mistral, and Titan models
API Reference
Core Classes
LLMClient- Unified client for all providersLLMConfig- Unified configurationTextRequest- Single prompt requestMessageRequest- Multi-turn conversation requestTextResponse- LLM response with metadataMessage- Conversation messageStreamChunk- Streaming response chunk
Request Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
prompt |
str | Input text prompt | Required |
model |
str | Model ID to use | Config default |
temperature |
float | Sampling temperature (0-1) | 0 |
max_tokens |
int | Maximum output tokens | 2048 |
top_p |
float | Nucleus sampling | 1.0 |
system_prompt |
str | System context | None |
stream |
bool | Enable streaming | False |
response_format |
BaseModel | Pydantic model for structured output | None |
use_cache |
bool | Enable caching | True |
clear_cache |
bool | Clear cache before request | False |
api_type |
str | OpenAI API type ("responses" or "chat_completions") |
"responses" |
reasoning_effort |
str | Reasoning effort ("low", "medium", "high") |
None |
on_progress |
Callable | Progress event callback (sync or async) | None |
Error Handling
from smartllm import LLMClient, TextRequest
async with LLMClient(provider="openai") as client:
try:
response = await client.generate_text(
TextRequest(prompt="Hello")
)
except ValueError as e:
print(f"Configuration error: {e}")
except Exception as e:
print(f"API error: {e}")
Development
Setup
git clone https://github.com/Redundando/smartllm.git
cd smartllm
pip install -e .[all,dev]
Running Tests
# Unit tests
pytest tests/unit/ -v
# Integration tests (select model interactively)
pytest tests/integration/
# Integration tests with a specific model
pytest tests/integration/ --model gpt-4o
# Integration tests with a reasoning model
pytest tests/integration/ --model gpt-5.2
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
Version 0.1.6
- Added
on_progresscallback toTextRequestandMessageRequest - Events:
llm_started,llm_done,cache_hit(withcache_source),error - Both sync and async callables supported
cache_sourceonTextResponseindicates cache origin:"miss","l1", or"l2"
Version 0.1.5
- Replaced custom logging with Logorator decorator-based logging
- Added two-level cache: local JSON files + optional DynamoDB via Dynamorator
- DynamoDB cache configurable via
dynamo_table_nameandcache_ttl_days(default: 365 days) - Cache write-back: DynamoDB hits are written to local cache automatically
- Prompt stored in cache metadata
- Recursive Pydantic schema cleaning for OpenAI structured output compatibility
logoratoranddynamoratoradded as core dependencies inpyproject.toml
Version 0.1.4
- Fixed logger name from
aws_llm_wrappertosmartllm - Removed redundant
response_format=json_objectwhen using tool-based structured output - Cache read failures now log a warning instead of silently returning
None - Added
reasoning_effortwarning when used with Bedrock models - Test suite now supports model selection via
--modelCLI option or interactive prompt - Integration tests support both OpenAI and AWS Bedrock models
- Bedrock streaming chunk parsing fixed for Claude models
Version 0.1.0
- Initial public release
- Unified interface for multiple providers
- OpenAI support (GPT models)
- AWS Bedrock support (Claude, Llama, Mistral, Titan)
- Async/await architecture
- Smart caching with temperature=0
- Auto retry with exponential backoff
- Structured output with Pydantic models
- Streaming responses
- Rate limiting and concurrency control
- OpenAI Response API support (primary interface)
- Reasoning model support with
reasoning_effortparameter - Comprehensive test suite
Support
- Issues: GitHub Issues
- Email: arved.kloehn@gmail.com
Acknowledgments
Built with:
- Pydantic for data validation
- Logorator for decorator-based logging
- Dynamorator for DynamoDB caching
- aioboto3 for AWS async support
- OpenAI Python SDK for OpenAI integration
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartllm-0.1.7.tar.gz.
File metadata
- Download URL: smartllm-0.1.7.tar.gz
- Upload date:
- Size: 77.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37df17642dcffb04fa76ad597711c046c7fc174425f2c0b94b4709311525b9e4
|
|
| MD5 |
246679b9c88a8b7483e9ceb9e6335674
|
|
| BLAKE2b-256 |
6cebc34b17663a471fa347bf40d6c292ab21d32b2b0bb3545e59545f0ca227c9
|
File details
Details for the file smartllm-0.1.7-py3-none-any.whl.
File metadata
- Download URL: smartllm-0.1.7-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f487a28db56ff50b411961e35165cb4df8e92605ed9bb12272eee7b16b3a7ee3
|
|
| MD5 |
6dd5f8d51f74d477805ff112c4a9eb21
|
|
| BLAKE2b-256 |
d3bb6271c9e74c96e3856d65f67f615b16da49efaf61385b86e49d1ff065daa1
|