Redis-based caching for Pydantic AI LLM agents with cost tracking

These details have not been verified by PyPI

Project description

LLM Caching

A Redis-based caching library for PydanticAI LLM agents with cost tracking support.

Caching responses is particularly useful in testing and development scenarios.

Typically for tests, developers mock LLM results to avoid latency and cost issues. However this can result in tests not detecting incorrect schemas for mocked data nor potential changes in LLM response schemas.

A cached response allows us to run the same prompts time and again without the cost or latency while being sure of real-world LLM responses.

Simply use cached_agent_run (async) or cached_agent_run_sync (sync) as a drop-in replacements for PydanticAI's agent.run() and agent.run_sync() respectively, to add support for caching, rate-limiting, and cost tracking.

NOTE: cached_agent_run and cached_agent_run_sync always return the complete result object, including data, usage information, and metadata.

Features

Redis-based caching for PydanticAI Agent responses
Flexible expense tracking
Rate limit handling with exponential backoff
Customizable cost tables for different models
Type-safe implementation
Comprehensive test coverage

Installation

pip install pyai-caching

Quick Start

Set an Environment variable to point to your redis cache:

export LLM_CACHE_REDIS_URL="redis://localhost:6379/0"

import os
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from pyai_caching import cached_agent_run
from typing import List

class UserProfile(BaseModel):
    name: str
    age: int
    interests: List[str]

profiler_agent = Agent(
    model="anthropic:claude-haiku-4-5",
    output_type=UserProfile,
    name="profiler",
    system_prompt="You read transcripts and extract pertinent details for a profile record on a person."
)

# The function returns the complete result object
result = await cached_agent_run(
    agent=profiler_agent,
    prompt="Make a profile on the user",
    task_name="make_profile",
    message_history=[{
        "role": "user", 
        "content": "Hi, my name is Alex. I'm 30 years old and I enjoy hiking and reading science fiction."
    }]
)

# Access the typed data from the result
profile = result.output
print(type(profile))
# <class '__main__.UserProfile'> (or similar based on execution context)
print(profile)
# name='Alex' age=30 interests=['hiking', 'reading science fiction']

# Access metadata from the result object
print(result.model)  # The model used
print(result.usage)  # Token usage information
print(result.cost)   # The cost of the request

Configuration

Redis Configuration

The library requires a Redis URL to be configured. You can provide it in two ways:

Environment variable (recommended):

export LLM_CACHE_REDIS_URL="redis://localhost:6379/0"

Direct configuration in code:

# Example using async version
result = await cached_agent_run(
    agent=your_agent,
    prompt="Hello",
    task_name="chat",
    redis_url="redis://localhost:6379/0"
)

Supported URL formats:

redis://[[username]:[password]]@localhost:6379/0
rediss://hostname:port/0 # SSL/TLS connection
redis+sentinel://localhost:26379/mymaster/0

Cost Configuration

The library comes with default cost tables for popular models. You can provide custom costs for your models:

custom_costs = {
    "my-custom-model": ModelCosts(
        cost_per_million_input_tokens=1.0,
        cost_per_million_output_tokens=2.0,
        cost_per_million_caching_input_tokens=0.5,
        cost_per_million_caching_hit_tokens=0.1,
    )
}

# Use custom costs
result = await cached_agent_run(
    agent=your_agent,
    prompt="Hello",
    task_name="chat",
    custom_costs=custom_costs
)

Advanced Usage

Rate Limit Handling

The library includes built-in rate limit handling with exponential backoff:

result = await cached_agent_run(
    agent=your_agent,
    prompt="Hello",
    task_name="chat",
    max_wait=30.0,  # Maximum wait time before giving up
    initial_wait=1.0  # Initial wait time for exponential backoff
)

Expense Tracking

Implement custom expense tracking:

import logging
from datetime import datetime

async def expense_tracker(model: str, task_name: str, cost: float) -> None:
    logging.info(f"Expense: {datetime.now()} - Model: {model}, Task: {task_name}, Cost: ${cost}")
    # Add your expense tracking logic here
    # e.g., save to database, send to monitoring service, etc.

result = await cached_agent_run(
    agent=your_agent,
    prompt="Hello",
    task_name="chat",
    expense_recorder=expense_tracker
)

Migration Guide

Version 0.2.0 Changes

Complete Result Objects
- Both cached_agent_run and cached_agent_run_sync now always return the complete result object
- The result object includes:
  - data: The typed response data
  - usage: Token usage information
  - metadata: Any additional model-specific metadata
Simplified Parameter Structure
- Removed transcript_history parameter (use message_history instead)
- Removed message_converter parameter (message conversion is now handled internally)
- All additional parameters are passed directly to agent.run via **kwargs
Message History Handling
- Message history is now passed directly via the message_history parameter
- Messages are automatically converted to the appropriate format
- The cache key incorporates the message history to ensure unique caching per conversation context

Example of migrating from 0.1.x to 0.2.0:

# Old code (0.1.x)
result = await cached_agent_run(
    agent=agent,
    prompt="Hello",
    task_name="chat",
    transcript_history=["User: Hi", "Assistant: Hello!"],
    message_converter=my_converter,
    full_result=True
)

# New code (0.2.0)
result = await cached_agent_run(
    agent=agent,
    prompt="Hello",
    task_name="chat",
    message_history=[
        ModelRequest(parts=[UserPromptPart(content="Hi")]),
        ModelResponse(parts=[TextPart(content="Hello!")])
    ]
)

Error Handling

The library provides specific exceptions for different error cases:

from pyai_caching.exceptions import UsageLimitExceeded, ConfigurationError

try:
    result = await cached_agent_run(
        agent=your_agent,
        prompt="Hello",
        task_name="chat"
    )
except UsageLimitExceeded:
    print("Rate limit exceeded and max wait time reached")
except ConfigurationError:
    print("Redis URL not configured")
except ValueError as e:
    print(f"Invalid input: {e}")

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

For local development, install uv, then:

uv sync
uv run pre-commit install

Run the test suite with uv run pytest. See CONTRIBUTING.md for the full workflow.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for version history.

Running Tests

# All tests
uv run pytest

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4

Apr 10, 2026

0.3.1

Mar 12, 2026

0.3.0

Sep 26, 2025

0.2.11

Feb 15, 2026

0.2.10

Sep 26, 2025

0.2.8

Jul 7, 2025

0.2.7

Apr 30, 2025

0.2.6

Apr 30, 2025

0.2.5

Apr 30, 2025

0.2.4

Apr 23, 2025

0.2.3

Apr 18, 2025

0.2.2

Apr 17, 2025

0.2.1

Apr 17, 2025

0.2.0

Apr 16, 2025

0.1.3

Apr 13, 2025

0.1.1

Apr 4, 2025

0.1.0

Apr 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyai_caching-0.4.tar.gz (20.9 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyai_caching-0.4-py3-none-any.whl (16.5 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file pyai_caching-0.4.tar.gz.

File metadata

Download URL: pyai_caching-0.4.tar.gz
Upload date: Apr 10, 2026
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyai_caching-0.4.tar.gz
Algorithm	Hash digest
SHA256	`2367759152af374ab81679931a678147b94a27948724a22eeecfa9eb09f187d0`
MD5	`5e75ad60262d6815e94f52a63a32c9e5`
BLAKE2b-256	`63be17980f51f19dea2248d2ff0b0d80bf2343e29795c7ff67545d12fc246bfd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyai_caching-0.4.tar.gz:

Publisher: publish-to-pypi.yml on talkingtoaj/PydanticAI-llm-caching

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyai_caching-0.4.tar.gz
- Subject digest: 2367759152af374ab81679931a678147b94a27948724a22eeecfa9eb09f187d0
- Sigstore transparency entry: 1271256565
- Sigstore integration time: Apr 10, 2026
Source repository:
- Permalink: talkingtoaj/PydanticAI-llm-caching@75a1c042580fd7b4dcc2b41813a6e31caacd11c7
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/talkingtoaj
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@75a1c042580fd7b4dcc2b41813a6e31caacd11c7
- Trigger Event: push

File details

Details for the file pyai_caching-0.4-py3-none-any.whl.

File metadata

Download URL: pyai_caching-0.4-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyai_caching-0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f2e4a4ffe25988c5dfbe68cd0964c244bee58af1668a44ffed5f5bbeb1ad9f6`
MD5	`870d8ecd142607174c9331343066693c`
BLAKE2b-256	`87ad8d6cd58dc02980899eace39eab2c6e345d8a6ca56b9a86ae2f2e5044cdd7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyai_caching-0.4-py3-none-any.whl:

Publisher: publish-to-pypi.yml on talkingtoaj/PydanticAI-llm-caching

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyai_caching-0.4-py3-none-any.whl
- Subject digest: 1f2e4a4ffe25988c5dfbe68cd0964c244bee58af1668a44ffed5f5bbeb1ad9f6
- Sigstore transparency entry: 1271256571
- Sigstore integration time: Apr 10, 2026
Source repository:
- Permalink: talkingtoaj/PydanticAI-llm-caching@75a1c042580fd7b4dcc2b41813a6e31caacd11c7
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/talkingtoaj
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@75a1c042580fd7b4dcc2b41813a6e31caacd11c7
- Trigger Event: push

pyai-caching 0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

LLM Caching

Features

Installation

Quick Start

Configuration

Redis Configuration

Cost Configuration

Advanced Usage

Rate Limit Handling

Expense Tracking

Migration Guide

Version 0.2.0 Changes

Error Handling

Contributing

License

Changelog

Running Tests

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance