Redis-based caching for Pydantic AI LLM agents with cost tracking
Project description
LLM Caching
A Redis-based caching library for PydanticAI LLM agents with cost tracking support.
Caching responses is particularly useful in testing and development scenarios.
Typically for tests, developers mock LLM results to avoid latency and cost issues. However this can result in tests not detecting incorrect schemas for mocked data nor potential changes in LLM response schemas.
A cached response allows us to run the same prompts time and again without the cost or latency while being sure of real-world LLM responses.
Simply use cached_agent_run (async) or cached_agent_run_sync (sync) as a drop-in replacements for PydanticAI's agent.run() and agent.run_sync() respectively, to add support for caching, rate-limiting, and cost tracking.
NOTE: cached_agent_run and cached_agent_run_sync always return the complete result object, including data, usage information, and metadata.
Features
- Redis-based caching for PydanticAI Agent responses
- Flexible expense tracking
- Rate limit handling with exponential backoff
- Customizable cost tables for different models
- Type-safe implementation
- Comprehensive test coverage
Installation
pip install pyai-caching
Quick Start
Set an Environment variable to point to your redis cache:
export LLM_CACHE_REDIS_URL="redis://localhost:6379/0"
import os
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pyai_caching import cached_agent_run
from typing import List
class UserProfile(BaseModel):
name: str
age: int
interests: List[str]
profiler_agent = Agent(
model="anthropic:claude-3-5-haiku-latest",
output_type=UserProfile,
name="profiler",
system_prompt="You read transcripts and extract pertinent details for a profile record on a person."
)
# The function returns the complete result object
result = await cached_agent_run(
agent=profiler_agent,
prompt="Make a profile on the user",
task_name="make_profile",
message_history=[{
"role": "user",
"content": "Hi, my name is Alex. I'm 30 years old and I enjoy hiking and reading science fiction."
}]
)
# Access the typed data from the result
profile = result.output
print(type(profile))
# <class '__main__.UserProfile'> (or similar based on execution context)
print(profile)
# name='Alex' age=30 interests=['hiking', 'reading science fiction']
# Access metadata from the result object
print(result.model) # The model used
print(result.usage) # Token usage information
print(result.cost) # The cost of the request
Configuration
Redis Configuration
The library requires a Redis URL to be configured. You can provide it in two ways:
- Environment variable (recommended):
export LLM_CACHE_REDIS_URL="redis://localhost:6379/0"
- Direct configuration in code:
# Example using async version
result = await cached_agent_run(
agent=your_agent,
prompt="Hello",
task_name="chat",
redis_url="redis://localhost:6379/0"
)
Supported URL formats:
redis://[[username]:[password]]@localhost:6379/0rediss://hostname:port/0# SSL/TLS connectionredis+sentinel://localhost:26379/mymaster/0
Cost Configuration
The library comes with default cost tables for popular models. You can provide custom costs for your models:
custom_costs = {
"my-custom-model": ModelCosts(
cost_per_million_input_tokens=1.0,
cost_per_million_output_tokens=2.0,
cost_per_million_caching_input_tokens=0.5,
cost_per_million_caching_hit_tokens=0.1,
)
}
# Use custom costs
result = await cached_agent_run(
agent=your_agent,
prompt="Hello",
task_name="chat",
custom_costs=custom_costs
)
Advanced Usage
Rate Limit Handling
The library includes built-in rate limit handling with exponential backoff:
result = await cached_agent_run(
agent=your_agent,
prompt="Hello",
task_name="chat",
max_wait=30.0, # Maximum wait time before giving up
initial_wait=1.0 # Initial wait time for exponential backoff
)
Expense Tracking
Implement custom expense tracking:
import logging
from datetime import datetime
async def expense_tracker(model: str, task_name: str, cost: float) -> None:
logging.info(f"Expense: {datetime.now()} - Model: {model}, Task: {task_name}, Cost: ${cost}")
# Add your expense tracking logic here
# e.g., save to database, send to monitoring service, etc.
result = await cached_agent_run(
agent=your_agent,
prompt="Hello",
task_name="chat",
expense_recorder=expense_tracker
)
Migration Guide
Version 0.2.0 Changes
-
Complete Result Objects
- Both
cached_agent_runandcached_agent_run_syncnow always return the complete result object - The result object includes:
data: The typed response datausage: Token usage informationmetadata: Any additional model-specific metadata
- Both
-
Simplified Parameter Structure
- Removed
transcript_historyparameter (usemessage_historyinstead) - Removed
message_converterparameter (message conversion is now handled internally) - All additional parameters are passed directly to
agent.runvia**kwargs
- Removed
-
Message History Handling
- Message history is now passed directly via the
message_historyparameter - Messages are automatically converted to the appropriate format
- The cache key incorporates the message history to ensure unique caching per conversation context
- Message history is now passed directly via the
Example of migrating from 0.1.x to 0.2.0:
# Old code (0.1.x)
result = await cached_agent_run(
agent=agent,
prompt="Hello",
task_name="chat",
transcript_history=["User: Hi", "Assistant: Hello!"],
message_converter=my_converter,
full_result=True
)
# New code (0.2.0)
result = await cached_agent_run(
agent=agent,
prompt="Hello",
task_name="chat",
message_history=[
ModelRequest(parts=[UserPromptPart(content="Hi")]),
ModelResponse(parts=[TextPart(content="Hello!")])
]
)
Error Handling
The library provides specific exceptions for different error cases:
from pyai_caching.exceptions import UsageLimitExceeded, ConfigurationError
try:
result = await cached_agent_run(
agent=your_agent,
prompt="Hello",
task_name="chat"
)
except UsageLimitExceeded:
print("Rate limit exceeded and max wait time reached")
except ConfigurationError:
print("Redis URL not configured")
except ValueError as e:
print(f"Invalid input: {e}")
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for version history.
Migration Guide (v0.2.0)
Breaking Changes
-
Return Value Changes
cached_agent_runandcached_agent_run_syncnow always return the complete result object- The
full_resultparameter has been removed - To access just the data, use
result.outputinstead of the result directly
-
Message History Handling
- The
transcript_historyparameter has been removed in favor ofmessage_history - Message history is now passed directly through kwargs
- The
message_converterparameter has been removed - messages are now handled natively
- The
Before (v0.1.x)
result = await cached_agent_run(
agent=agent,
transcript_history=[{"role": "user", "content": "Hello"}],
prompt="Reply to the user",
task_name="chat",
message_converter=my_converter,
full_result=False
)
# result contains just the data
After (v0.2.0)
result = await cached_agent_run(
agent=agent,
message_history=[{"role": "user", "content": "Hello"}],
prompt="Reply to the user",
task_name="chat"
)
# result contains the full result object
data = result.output # Access just the data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyai_caching-0.3.0.tar.gz.
File metadata
- Download URL: pyai_caching-0.3.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c10652d071248a779e96090b0df3720898cec370dad05b486a8d023256a2a31d
|
|
| MD5 |
8c8e1c9b76e361344a675a23e6fcc5e3
|
|
| BLAKE2b-256 |
abc60edefb54d7edc3a75c1edfa9fa7d7c680bff21da27a17abb64ac7d47f4a6
|
Provenance
The following attestation bundles were made for pyai_caching-0.3.0.tar.gz:
Publisher:
publish-to-pypi.yml on talkingtoaj/PydanticAI-llm-caching
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyai_caching-0.3.0.tar.gz -
Subject digest:
c10652d071248a779e96090b0df3720898cec370dad05b486a8d023256a2a31d - Sigstore transparency entry: 563663704
- Sigstore integration time:
-
Permalink:
talkingtoaj/PydanticAI-llm-caching@772c00d8b6a03579beb8a9d3bd50f50a8828b635 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/talkingtoaj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@772c00d8b6a03579beb8a9d3bd50f50a8828b635 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyai_caching-0.3.0-py3-none-any.whl.
File metadata
- Download URL: pyai_caching-0.3.0-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c27350ae980f71421564caa1620f54decc9a2ba209bfdae8ea2b1d5578ba126
|
|
| MD5 |
676a71ddaab60ad504ad976007ab679b
|
|
| BLAKE2b-256 |
9b729bf430e034a7b220bc02f7bad8e1c449c168313acaba56d426b30b49d9c3
|
Provenance
The following attestation bundles were made for pyai_caching-0.3.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on talkingtoaj/PydanticAI-llm-caching
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyai_caching-0.3.0-py3-none-any.whl -
Subject digest:
2c27350ae980f71421564caa1620f54decc9a2ba209bfdae8ea2b1d5578ba126 - Sigstore transparency entry: 563663707
- Sigstore integration time:
-
Permalink:
talkingtoaj/PydanticAI-llm-caching@772c00d8b6a03579beb8a9d3bd50f50a8828b635 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/talkingtoaj
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@772c00d8b6a03579beb8a9d3bd50f50a8828b635 -
Trigger Event:
push
-
Statement type: