Intelligent cache management for LangChain Anthropic models with advanced optimization strategies

These details have not been verified by PyPI

Project links

Project description

LangChain Anthropic Smart Cache

🚀 Intelligent cache management for LangChain Anthropic models with advanced optimization strategies

📚 Learn about Anthropic's prompt caching: Official Documentation

⚡ What is this?

A sophisticated callback handler that automatically optimizes Anthropic Claude's cache usage to reduce costs and improve performance. It implements intelligent priority-based caching that ensures your most important content (tools, system prompts, large content blocks) gets cached first.

🎯 Key Features

Smart Priority System: Tools and system prompts get priority when not cached
Automatic Cache Management: 5-minute cache duration with intelligent refresh
Cost Optimization: Prioritizes larger content blocks for maximum savings
Detailed Analytics: Comprehensive logging and cache efficiency metrics
Zero Configuration: Works out of the box with sensible defaults
Anthropic Optimized: Built specifically for Claude's cache_control feature

📦 Installation

pip install langchain-anthropic-smart-cache

🚀 Quick Start

from langchain_anthropic import ChatAnthropic
from langchain_anthropic_smart_cache import SmartCacheCallbackHandler

# Initialize the cache handler
cache_handler = SmartCacheCallbackHandler(
    cache_duration=300,  # 5 minutes
    max_cache_blocks=4,  # Anthropic's limit
    min_token_count=1024  # Minimum tokens to cache
)

# Add to your LangChain model
llm = ChatAnthropic(
    model="claude-3-5-sonnet-20241022",
    callbacks=[cache_handler]
)

# Use normally - caching happens automatically!
response = llm.invoke([
    {"role": "system", "content": "You are a helpful assistant..."},
    {"role": "user", "content": "Hello!"}
])

🧠 How Smart Caching Works

🎯 Priority-Based Cache Management

The system uses a sophisticated 5-level priority system to ensure the most valuable content gets cached first:

graph TD
    A[Incoming Request] --> B{Analyze Content}
    B --> C[Tools Available?]
    B --> D[System Prompts?]
    B --> E[User Content]

    C --> F{Tools Cached?}
    F -->|No| G[Priority 1: Cache Tools]
    F -->|Yes, Fresh| H[Skip Tools]
    F -->|Yes, Expiring| I[Priority 4: Refresh Tools]

    D --> J{System Cached?}
    J -->|No| K[Priority 2: Cache System]
    J -->|Yes, Fresh| L[Skip System]
    J -->|Yes, Expiring| M[Priority 5: Refresh System]

    E --> N[Priority 3: Cache Content by Size]

    G --> O[Allocate Cache Slots]
    K --> O
    N --> O
    I --> O
    M --> O

    O --> P{Slots Available?}
    P -->|Yes| Q[Apply Cache Control]
    P -->|No| R[Skip Lower Priority Items]

🔄 Cache Lifecycle Flow

sequenceDiagram
    participant User
    participant Handler as SmartCacheHandler
    participant Cache as Cache Storage
    participant Claude as Anthropic API

    User->>Handler: Send Request with Tools + System + Content
    Handler->>Cache: Check existing cache status
    Cache-->>Handler: Return cache metadata

    Note over Handler: Priority Analysis
    Handler->>Handler: Priority 1: Uncached Tools
    Handler->>Handler: Priority 2: Uncached System
    Handler->>Handler: Priority 3: Content (by size)
    Handler->>Handler: Priority 4: Expiring Tools
    Handler->>Handler: Priority 5: Expiring System

    Note over Handler: Slot Allocation (Max 4)
    Handler->>Handler: Assign cache_control headers
    Handler->>Claude: Send optimized request
    Claude-->>Handler: Response with cache info
    Handler->>Cache: Update cache metadata
    Handler-->>User: Response + Cache Analytics

🎲 Decision Algorithm

The cache decision algorithm follows this logic:

flowchart TD
    Start([New Request]) --> Clear[Clear Previous Cache Controls]
    Clear --> Parse[Parse Messages for Tools/System/Content]

    Parse --> CheckTools{Tools Present?}
    CheckTools -->|Yes| ToolsCached{Tools Cached & Fresh?}
    CheckTools -->|No| CheckSystem{System Prompts?}

    ToolsCached -->|No| AddTools[Add Tools - Priority 1]
    ToolsCached -->|Yes| CheckSystem

    CheckSystem -->|Yes| SystemCached{System Cached & Fresh?}
    CheckSystem -->|No| ProcessContent[Process Content Blocks]

    SystemCached -->|No| AddSystem[Add System - Priority 2]
    SystemCached -->|Yes| ProcessContent

    ProcessContent --> SortContent[Sort Content by Token Count]
    SortContent --> AddContent[Add Content - Priority 3]

    AddTools --> CheckSlots{Slots < 4?}
    AddSystem --> CheckSlots
    AddContent --> CheckSlots

    CheckSlots -->|Yes| MoreContent{More Items?}
    CheckSlots -->|No| RefreshCheck{Expired Items to Refresh?}

    MoreContent -->|Yes| AddContent
    MoreContent -->|No| RefreshCheck

    RefreshCheck -->|Yes| AddRefresh[Add Refresh - Priority 4/5]
    RefreshCheck -->|No| Complete[Complete Cache Assignment]

    AddRefresh --> FinalCheck{Slots < 4?}
    FinalCheck -->|Yes| RefreshCheck
    FinalCheck -->|No| Complete

    Complete --> SendRequest[Send to Anthropic API]
    SendRequest --> UpdateCache[Update Cache Metadata]
    UpdateCache --> End([Return Response])

    style AddTools fill:#ff6b6b
    style AddSystem fill:#4ecdc4
    style AddContent fill:#45b7d1
    style AddRefresh fill:#96ceb4

💡 Priority System Explained

Priority	Type	Condition	Why?
1 🔴	Tools	Not cached or expired	Critical for function calling - failures break functionality
2 🟠	System	Not cached or expired	Core instructions that define AI behavior
3 🟡	Content	Always evaluated	User data, sorted by size for maximum cache efficiency
4 🟢	Tools	Cached but expiring soon	Refresh tools proactively to avoid cache misses
5 🔵	System	Cached but expiring soon	Refresh system prompts when slots available

📊 Cache Efficiency Example

pie title Cache Slot Allocation Example
    "Tools (Priority 1)" : 25
    "System (Priority 2)" : 25
    "Large Content (Priority 3)" : 35
    "Medium Content (Priority 3)" : 15

Scenario: 4 available slots, competing content

🔴 Slot 1: Tools (3,000 tokens) - Priority 1 (uncached)
🟠 Slot 2: System prompt (1,200 tokens) - Priority 2 (uncached)
🟡 Slot 3: Large content (5,000 tokens) - Priority 3 (new)
🟡 Slot 4: Medium content (2,000 tokens) - Priority 3 (new)
❌ Skipped: Small content (800 tokens) - Priority 3 (below minimum)
❌ Skipped: Cached system refresh (1,200 tokens) - Priority 5 (no slots left)

Result: 11,200 tokens cached, optimizing for both functionality and cost savings.

📊 Cache Analytics

The handler provides detailed logging:

💾 CACHED tools (slot 1/4) - NEW tools needed caching
⚡ CACHED content (slot 2/4, 3001 tokens) - MAINTAIN existing cache
🔄 CACHED content (slot 3/4, 2000 tokens) - REFRESH expiring cache
💾 CACHED content (slot 4/4, 1705 tokens) - NEW content block

🚫 SKIPPED ITEMS (2 items):
  ❌ content (priority 3, new, 1524 tokens) - smaller new content, larger cached content prioritized
  ❌ system (priority 5, cached, 1182 tokens) - system already cached, content got priority

📊 CACHE SUMMARY:
  🎯 Slots used: 4/4
  ⚡ Previously cached: 2 items (50.0%)
  💾 Newly cached: 2 items
  🚫 Skipped: 2 items
  📈 Cached tokens: 7,886 | Skipped tokens: 2,706

⚙️ Configuration

cache_handler = SmartCacheCallbackHandler(
    cache_duration=300,      # Cache validity in seconds (default: 5 minutes)
    max_cache_blocks=4,      # Max cache blocks (Anthropic limit: 4)
    min_token_count=1024,    # Minimum tokens to consider for caching
    enable_logging=True,     # Enable detailed cache logging
    log_level="INFO",        # Logging level
    cache_dir=None,          # Custom cache directory (default: temp)
)

🎯 Advanced Usage

With Tools

from langchain_core.tools import tool

@tool
def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return f"Weather in {location}: Sunny, 72°F"

# Tools automatically get highest priority when not cached
llm_with_tools = llm.bind_tools([get_weather])

Cache Statistics

# Access cache statistics
stats = cache_handler.get_stats()
print(f"Cache hit rate: {stats.cache_hit_rate:.1f}%")
print(f"Total tokens cached: {stats.total_tokens_cached:,}")
print(f"Estimated cost savings: ${stats.estimated_savings:.2f}")

🔧 Requirements

Python 3.8+
langchain-core >= 0.3.62
langchain-anthropic >= 0.3.14
tiktoken >= 0.8.0

Note: This package is specifically designed for Anthropic Claude models that support the cache_control feature. Other providers may be added in future versions.

📈 Performance Benefits

Cost Reduction: Up to 90% savings on repeated content
Latency Improvement: Cached content loads ~10x faster
Smart Prioritization: Ensures most valuable content stays cached
Automatic Management: No manual cache invalidation needed

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built for the LangChain ecosystem
Optimized for Anthropic Claude models
Inspired by modern caching strategies and cost optimization principles

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Aug 27, 2025

0.2.2

Aug 27, 2025

0.2.1

Aug 27, 2025

This version

0.2.0

Jun 11, 2025

0.1.0

Jun 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_anthropic_smart_cache-0.2.0.tar.gz (23.5 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_anthropic_smart_cache-0.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Jun 11, 2025 Python 3

File details

Details for the file langchain_anthropic_smart_cache-0.2.0.tar.gz.

File metadata

Download URL: langchain_anthropic_smart_cache-0.2.0.tar.gz
Upload date: Jun 11, 2025
Size: 23.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for langchain_anthropic_smart_cache-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4c600382ee544bf7cda5f8a8e95869d979527a379e62aa1cd85813ce230f8d30`
MD5	`644568a4c11658bf810646a4d1cdc445`
BLAKE2b-256	`c5600687a21d808a535ae37403359c5021e3ba96d7a771ccaf3a0cba002f89b9`

See more details on using hashes here.

File details

Details for the file langchain_anthropic_smart_cache-0.2.0-py3-none-any.whl.

File metadata

Download URL: langchain_anthropic_smart_cache-0.2.0-py3-none-any.whl
Upload date: Jun 11, 2025
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for langchain_anthropic_smart_cache-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44865489a7aca1eca91d264d8a60aa7ca2c0ded37de24574a889692fba206a30`
MD5	`b6eef736e7039cd7047be25162ad3703`
BLAKE2b-256	`bc09d03d7b40259b5589316f74f4c0c4d9b87bb4bf14e7d9f5a2612d0cc8b0a4`

See more details on using hashes here.

langchain-anthropic-smart-cache 0.2.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

LangChain Anthropic Smart Cache

⚡ What is this?

🎯 Key Features

📦 Installation

🚀 Quick Start

🧠 How Smart Caching Works

🎯 Priority-Based Cache Management

🔄 Cache Lifecycle Flow

🎲 Decision Algorithm

💡 Priority System Explained

📊 Cache Efficiency Example

📊 Cache Analytics

⚙️ Configuration

🎯 Advanced Usage

With Tools

Cache Statistics

🔧 Requirements

📈 Performance Benefits

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes