A unified async Python wrapper for multiple LLM providers with OpenAI Response API and reasoning support

These details have not been verified by PyPI

Project links

Project description

SmartLLM

A unified async Python wrapper for multiple LLM providers with a consistent interface.

Features

Unified Interface - Single API for multiple LLM providers (OpenAI, AWS Bedrock)
Async/Await - Built on asyncio for high-performance concurrent requests
Smart Caching - Two-level cache (local + DynamoDB) to reduce costs and latency
Auto Retry - Exponential backoff retry logic for transient failures
Structured Output - Native Pydantic model support for type-safe responses
Streaming - Real-time streaming responses for better UX
Rate Limiting - Built-in concurrency control per model
Decorator Logging - Automatic function logging via Logorator
Progress Callbacks - Optional on_progress callback for real-time LLM events
OpenAI Response API - Full support for OpenAI's primary API including reasoning models

Installation

pip install smartllm

Optional Dependencies

Install only the providers you need:

# For OpenAI
pip install smartllm[openai]

# For AWS Bedrock
pip install smartllm[bedrock]

# For all providers
pip install smartllm[all]

DynamoDB Caching (optional)

To enable shared two-level caching across machines:

async with LLMClient(provider="openai", dynamo_table_name="my-llm-cache") as client:
    ...

Requires AWS credentials with DynamoDB access. The table is auto-created if it doesn't exist. Local file cache is always used as the first layer.

Quick Start

Basic Usage

import asyncio
from smartllm import LLMClient, TextRequest

async def main():
    # Auto-detects provider from environment variables
    async with LLMClient(provider="openai") as client:
        response = await client.generate_text(
            TextRequest(prompt="What is the capital of France?")
        )
        print(response.text)

asyncio.run(main())

Multi-turn Conversations

from smartllm import LLMClient, MessageRequest, Message

async with LLMClient(provider="openai") as client:
    messages = [
        Message(role="user", content="My name is Alice."),
        Message(role="assistant", content="Nice to meet you, Alice!"),
        Message(role="user", content="What's my name?"),
    ]
    
    response = await client.send_message(
        MessageRequest(messages=messages)
    )
    print(response.text)  # "Your name is Alice."

Streaming Responses

from smartllm import LLMClient, TextRequest

async with LLMClient(provider="openai") as client:
    request = TextRequest(
        prompt="Write a short poem about Python.",
        stream=True
    )
    
    async for chunk in client.generate_text_stream(request):
        print(chunk.text, end="", flush=True)

Structured Output with Pydantic

from pydantic import BaseModel
from smartllm import LLMClient, TextRequest

class Person(BaseModel):
    name: str
    age: int
    occupation: str

async with LLMClient(provider="openai") as client:
    response = await client.generate_text(
        TextRequest(
            prompt="Generate a person profile for a software engineer named John, age 30.",
            response_format=Person
        )
    )
    
    person = response.structured_data
    print(f"{person.name} is a {person.age} year old {person.occupation}")

Configuration

Environment Variables

OpenAI:

export OPENAI_API_KEY="your-api-key"
export OPENAI_MODEL="gpt-4o-mini"  # Optional

AWS Bedrock:

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"
export BEDROCK_MODEL="anthropic.claude-3-sonnet-20240229-v1:0"  # Optional

Programmatic Configuration

from smartllm import LLMClient, LLMConfig

config = LLMConfig(
    provider="openai",
    api_key="your-api-key",
    default_model="gpt-4o",
    temperature=0.7,
    max_tokens=2048,
    max_retries=3,
)

async with LLMClient(config) as client:
    # Use client...
    pass

Customizing Defaults

from smartllm import defaults

# Modify global defaults
defaults.DEFAULT_TEMPERATURE = 0.7
defaults.DEFAULT_MAX_TOKENS = 4096
defaults.DEFAULT_MAX_RETRIES = 5

OpenAI API Types

SmartLLM supports both OpenAI APIs via the api_type parameter:

"responses" (default) - OpenAI's primary Response API, recommended for all modern models
"chat_completions" - Legacy Chat Completions API, supported indefinitely

# Response API (default)
response = await client.generate_text(
    TextRequest(prompt="Hello", api_type="responses")
)

# Chat Completions API (legacy)
response = await client.generate_text(
    TextRequest(prompt="Hello", api_type="chat_completions")
)

Reasoning Models

For models that support reasoning (e.g. GPT-5.x), use reasoning_effort to control how much the model reasons before responding. Reasoning tokens are returned in response.metadata:

response = await client.generate_text(
    TextRequest(
        prompt="Solve: what is the 100th Fibonacci number?",
        reasoning_effort="high",  # "low", "medium", or "high"
    )
)

print(response.text)
print(f"Reasoning tokens used: {response.metadata.get('reasoning_tokens', 0)}")

Note: reasoning models do not support temperature. Passing a value other than 1 will raise a ValueError.

Reasoning with Structured Output

from pydantic import BaseModel
from smartllm import LLMClient, TextRequest

class Solution(BaseModel):
    answer: float
    unit: str
    explanation: str

async with LLMClient(provider="openai") as client:
    response = await client.generate_text(
        TextRequest(
            prompt="A train leaves city A at 60mph toward city B (300 miles away). Another leaves B at 90mph. When do they meet?",
            response_format=Solution,
            reasoning_effort="medium",
        )
    )

    solution = response.structured_data
    print(f"{solution.answer} {solution.unit}: {solution.explanation}")
    print(f"Reasoning tokens: {response.metadata.get('reasoning_tokens', 0)}")

Advanced Features

Caching

Responses are automatically cached when temperature=0:

# First call - hits API
response1 = await client.generate_text(
    TextRequest(prompt="What is 2+2?", temperature=0)
)

# Second call - uses cache (instant, free)
response2 = await client.generate_text(
    TextRequest(prompt="What is 2+2?", temperature=0)
)

# Clear cache for specific request
response3 = await client.generate_text(
    TextRequest(prompt="What is 2+2?", temperature=0, clear_cache=True)
)

Concurrent Requests

import asyncio
from smartllm import LLMClient, TextRequest

async with LLMClient(provider="openai") as client:
    prompts = ["Question 1", "Question 2", "Question 3"]
    
    tasks = [
        client.generate_text(TextRequest(prompt=p))
        for p in prompts
    ]
    
    responses = await asyncio.gather(*tasks)

Rate Limiting

# Limit concurrent requests
client = LLMClient(provider="openai", max_concurrent=5)

Progress Callbacks

Pass an on_progress callable to TextRequest or MessageRequest to receive real-time events. Both sync and async callables are supported.

async def on_progress(event):
    print(event)

async with LLMClient(provider="openai") as client:
    response = await client.generate_text(
        TextRequest(prompt="What is the capital of France?", on_progress=on_progress)
    )

Each event is a dict with event, ts (Unix timestamp), prompt, model, and provider fields:

event	additional fields	notes
`llm_started`	—	fired before API call / cache check
`llm_done`	—	fired after a live API call completes
`cache_hit`	`cache_source`	fired when response is served from cache; `cache_source` is `"l1"` (local) or `"l2"` (DynamoDB)
`error`	`message`	fired on exception

Provider-Specific Clients

For advanced use cases, access provider-specific clients:

from smartllm.openai import OpenAILLMClient, OpenAIConfig
from smartllm.bedrock import BedrockLLMClient, BedrockConfig

# OpenAI-specific features
openai_config = OpenAIConfig(api_key="...", organization="...")
async with OpenAILLMClient(openai_config) as client:
    models = await client.list_available_models()

# Bedrock-specific features
bedrock_config = BedrockConfig(aws_region="us-east-1")
async with BedrockLLMClient(bedrock_config) as client:
    models = await client.list_available_model_ids()

Supported Providers

OpenAI - GPT models via OpenAI API
AWS Bedrock - Claude, Llama, Mistral, and Titan models

API Reference

Core Classes

LLMClient - Unified client for all providers
LLMConfig - Unified configuration
TextRequest - Single prompt request
MessageRequest - Multi-turn conversation request
TextResponse - LLM response with metadata
Message - Conversation message
StreamChunk - Streaming response chunk

Request Parameters

Parameter	Type	Description	Default
`prompt`	str	Input text prompt	Required
`model`	str	Model ID to use	Config default
`temperature`	float	Sampling temperature (0-1)	0
`max_tokens`	int	Maximum output tokens	2048
`top_p`	float	Nucleus sampling	1.0
`system_prompt`	str	System context	None
`stream`	bool	Enable streaming	False
`response_format`	BaseModel	Pydantic model for structured output	None
`use_cache`	bool	Enable caching	True
`clear_cache`	bool	Clear cache before request	False
`api_type`	str	OpenAI API type (`"responses"` or `"chat_completions"`)	`"responses"`
`reasoning_effort`	str	Reasoning effort (`"low"`, `"medium"`, `"high"`)	None
`on_progress`	Callable	Progress event callback (sync or async)	None

Error Handling

from smartllm import LLMClient, TextRequest

async with LLMClient(provider="openai") as client:
    try:
        response = await client.generate_text(
            TextRequest(prompt="Hello")
        )
    except ValueError as e:
        print(f"Configuration error: {e}")
    except Exception as e:
        print(f"API error: {e}")

Development

Setup

git clone https://github.com/Redundando/smartllm.git
cd smartllm
pip install -e .[all,dev]

Running Tests

# Unit tests
pytest tests/unit/ -v

# Integration tests (select model interactively)
pytest tests/integration/

# Integration tests with a specific model
pytest tests/integration/ --model gpt-4o

# Integration tests with a reasoning model
pytest tests/integration/ --model gpt-5.2

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

Version 0.1.6

Added on_progress callback to TextRequest and MessageRequest
Events: llm_started, llm_done, cache_hit (with cache_source), error
Both sync and async callables supported
cache_source on TextResponse indicates cache origin: "miss", "l1", or "l2"

Version 0.1.5

Replaced custom logging with Logorator decorator-based logging
Added two-level cache: local JSON files + optional DynamoDB via Dynamorator
DynamoDB cache configurable via dynamo_table_name and cache_ttl_days (default: 365 days)
Cache write-back: DynamoDB hits are written to local cache automatically
Prompt stored in cache metadata
Recursive Pydantic schema cleaning for OpenAI structured output compatibility
logorator and dynamorator added as core dependencies in pyproject.toml

Version 0.1.4

Fixed logger name from aws_llm_wrapper to smartllm
Removed redundant response_format=json_object when using tool-based structured output
Cache read failures now log a warning instead of silently returning None
Added reasoning_effort warning when used with Bedrock models
Test suite now supports model selection via --model CLI option or interactive prompt
Integration tests support both OpenAI and AWS Bedrock models
Bedrock streaming chunk parsing fixed for Claude models

Version 0.1.0

Initial public release
Unified interface for multiple providers
OpenAI support (GPT models)
AWS Bedrock support (Claude, Llama, Mistral, Titan)
Async/await architecture
Smart caching with temperature=0
Auto retry with exponential backoff
Structured output with Pydantic models
Streaming responses
Rate limiting and concurrency control
OpenAI Response API support (primary interface)
Reasoning model support with reasoning_effort parameter
Comprehensive test suite

Support

Issues: GitHub Issues
Email: arved.kloehn@gmail.com

Acknowledgments

Built with:

Pydantic for data validation
Logorator for decorator-based logging
Dynamorator for DynamoDB caching
aioboto3 for AWS async support
OpenAI Python SDK for OpenAI integration

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.14

Mar 9, 2026

0.1.13

Feb 28, 2026

0.1.12

Feb 24, 2026

0.1.11

Feb 24, 2026

0.1.10

Feb 24, 2026

0.1.9

Feb 24, 2026

0.1.8

Feb 24, 2026

This version

0.1.7

Feb 22, 2026

0.1.6

Feb 21, 2026

0.1.4

Feb 20, 2026

0.1.3

Feb 18, 2026

0.1.2

Feb 18, 2026

0.1.1

Feb 18, 2026

0.1.0

Feb 18, 2026

0.0.8

Dec 4, 2025

0.0.7

Mar 27, 2025

0.0.6

Mar 21, 2025

0.0.5

Mar 21, 2025

0.0.4

Mar 11, 2025

0.0.3

Mar 10, 2025

0.0.2

Mar 10, 2025

0.0.1

Mar 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartllm-0.1.7.tar.gz (77.5 kB view details)

Uploaded Feb 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smartllm-0.1.7-py3-none-any.whl (32.3 kB view details)

Uploaded Feb 22, 2026 Python 3

File details

Details for the file smartllm-0.1.7.tar.gz.

File metadata

Download URL: smartllm-0.1.7.tar.gz
Upload date: Feb 22, 2026
Size: 77.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for smartllm-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`37df17642dcffb04fa76ad597711c046c7fc174425f2c0b94b4709311525b9e4`
MD5	`246679b9c88a8b7483e9ceb9e6335674`
BLAKE2b-256	`6cebc34b17663a471fa347bf40d6c292ab21d32b2b0bb3545e59545f0ca227c9`

See more details on using hashes here.

File details

Details for the file smartllm-0.1.7-py3-none-any.whl.

File metadata

Download URL: smartllm-0.1.7-py3-none-any.whl
Upload date: Feb 22, 2026
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for smartllm-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f487a28db56ff50b411961e35165cb4df8e92605ed9bb12272eee7b16b3a7ee3`
MD5	`6dd5f8d51f74d477805ff112c4a9eb21`
BLAKE2b-256	`d3bb6271c9e74c96e3856d65f67f615b16da49efaf61385b86e49d1ff065daa1`

See more details on using hashes here.

smartllm 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SmartLLM

Features

Installation

Optional Dependencies

DynamoDB Caching (optional)

Quick Start

Basic Usage

Multi-turn Conversations

Streaming Responses

Structured Output with Pydantic

Configuration

Environment Variables

Programmatic Configuration

Customizing Defaults

OpenAI API Types

Reasoning Models

Reasoning with Structured Output

Advanced Features

Caching

Concurrent Requests

Rate Limiting

Progress Callbacks

Provider-Specific Clients

Supported Providers

API Reference

Core Classes

Request Parameters

Error Handling

Development

Setup

Running Tests

Contributing

License

Changelog

Version 0.1.6

Version 0.1.5

Version 0.1.4

Version 0.1.0

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes