Intelligent cache management for LangChain Anthropic models with advanced optimization strategies
Project description
LangChain Anthropic Smart Cache
🚀 Intelligent cache management for LangChain Anthropic models with advanced optimization strategies
📚 Learn about Anthropic's prompt caching: Official Documentation
⚡ What is this?
A sophisticated callback handler that automatically optimizes Anthropic Claude's cache usage to reduce costs and improve performance. It implements intelligent priority-based caching that ensures your most important content (tools, system prompts, large content blocks) gets cached first.
🎯 Key Features
- Smart Priority System: Tools and system prompts get priority when not cached
- Automatic Cache Management: 5-minute cache duration with intelligent refresh
- Cost Optimization: Prioritizes larger content blocks for maximum savings
- Detailed Analytics: Comprehensive logging and cache efficiency metrics
- Zero Configuration: Works out of the box with sensible defaults
- Anthropic Optimized: Built specifically for Claude's cache_control feature
📦 Installation
pip install langchain-anthropic-smart-cache
🚀 Quick Start
from langchain_anthropic import ChatAnthropic
from langchain_anthropic_smart_cache import SmartCacheCallbackHandler
# Initialize the cache handler
cache_handler = SmartCacheCallbackHandler(
cache_duration=300, # 5 minutes
max_cache_blocks=4, # Anthropic's limit
min_token_count=1024 # Minimum tokens to cache
)
# Add to your LangChain model
llm = ChatAnthropic(
model="claude-3-5-sonnet-20241022",
callbacks=[cache_handler]
)
# Use normally - caching happens automatically!
response = llm.invoke([
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Hello!"}
])
🧠 How Smart Caching Works
🎯 Priority-Based Cache Management
The system uses a sophisticated 5-level priority system to ensure the most valuable content gets cached first:
graph TD
A[Incoming Request] --> B{Analyze Content}
B --> C[Tools Available?]
B --> D[System Prompts?]
B --> E[User Content]
C --> F{Tools Cached?}
F -->|No| G[Priority 1: Cache Tools]
F -->|Yes, Fresh| H[Skip Tools]
F -->|Yes, Expiring| I[Priority 4: Refresh Tools]
D --> J{System Cached?}
J -->|No| K[Priority 2: Cache System]
J -->|Yes, Fresh| L[Skip System]
J -->|Yes, Expiring| M[Priority 5: Refresh System]
E --> N[Priority 3: Cache Content by Size]
G --> O[Allocate Cache Slots]
K --> O
N --> O
I --> O
M --> O
O --> P{Slots Available?}
P -->|Yes| Q[Apply Cache Control]
P -->|No| R[Skip Lower Priority Items]
🔄 Cache Lifecycle Flow
sequenceDiagram
participant User
participant Handler as SmartCacheHandler
participant Cache as Cache Storage
participant Claude as Anthropic API
User->>Handler: Send Request with Tools + System + Content
Handler->>Cache: Check existing cache status
Cache-->>Handler: Return cache metadata
Note over Handler: Priority Analysis
Handler->>Handler: Priority 1: Uncached Tools
Handler->>Handler: Priority 2: Uncached System
Handler->>Handler: Priority 3: Content (by size)
Handler->>Handler: Priority 4: Expiring Tools
Handler->>Handler: Priority 5: Expiring System
Note over Handler: Slot Allocation (Max 4)
Handler->>Handler: Assign cache_control headers
Handler->>Claude: Send optimized request
Claude-->>Handler: Response with cache info
Handler->>Cache: Update cache metadata
Handler-->>User: Response + Cache Analytics
🎲 Decision Algorithm
The cache decision algorithm follows this logic:
flowchart TD
Start([New Request]) --> Clear[Clear Previous Cache Controls]
Clear --> Parse[Parse Messages for Tools/System/Content]
Parse --> CheckTools{Tools Present?}
CheckTools -->|Yes| ToolsCached{Tools Cached & Fresh?}
CheckTools -->|No| CheckSystem{System Prompts?}
ToolsCached -->|No| AddTools[Add Tools - Priority 1]
ToolsCached -->|Yes| CheckSystem
CheckSystem -->|Yes| SystemCached{System Cached & Fresh?}
CheckSystem -->|No| ProcessContent[Process Content Blocks]
SystemCached -->|No| AddSystem[Add System - Priority 2]
SystemCached -->|Yes| ProcessContent
ProcessContent --> SortContent[Sort Content by Token Count]
SortContent --> AddContent[Add Content - Priority 3]
AddTools --> CheckSlots{Slots < 4?}
AddSystem --> CheckSlots
AddContent --> CheckSlots
CheckSlots -->|Yes| MoreContent{More Items?}
CheckSlots -->|No| RefreshCheck{Expired Items to Refresh?}
MoreContent -->|Yes| AddContent
MoreContent -->|No| RefreshCheck
RefreshCheck -->|Yes| AddRefresh[Add Refresh - Priority 4/5]
RefreshCheck -->|No| Complete[Complete Cache Assignment]
AddRefresh --> FinalCheck{Slots < 4?}
FinalCheck -->|Yes| RefreshCheck
FinalCheck -->|No| Complete
Complete --> SendRequest[Send to Anthropic API]
SendRequest --> UpdateCache[Update Cache Metadata]
UpdateCache --> End([Return Response])
style AddTools fill:#ff6b6b
style AddSystem fill:#4ecdc4
style AddContent fill:#45b7d1
style AddRefresh fill:#96ceb4
💡 Priority System Explained
| Priority | Type | Condition | Why? |
|---|---|---|---|
| 1 🔴 | Tools | Not cached or expired | Critical for function calling - failures break functionality |
| 2 🟠 | System | Not cached or expired | Core instructions that define AI behavior |
| 3 🟡 | Content | Always evaluated | User data, sorted by size for maximum cache efficiency |
| 4 🟢 | Tools | Cached but expiring soon | Refresh tools proactively to avoid cache misses |
| 5 🔵 | System | Cached but expiring soon | Refresh system prompts when slots available |
📊 Cache Efficiency Example
pie title Cache Slot Allocation Example
"Tools (Priority 1)" : 25
"System (Priority 2)" : 25
"Large Content (Priority 3)" : 35
"Medium Content (Priority 3)" : 15
Scenario: 4 available slots, competing content
- 🔴 Slot 1: Tools (3,000 tokens) - Priority 1 (uncached)
- 🟠 Slot 2: System prompt (1,200 tokens) - Priority 2 (uncached)
- 🟡 Slot 3: Large content (5,000 tokens) - Priority 3 (new)
- 🟡 Slot 4: Medium content (2,000 tokens) - Priority 3 (new)
- ❌ Skipped: Small content (800 tokens) - Priority 3 (below minimum)
- ❌ Skipped: Cached system refresh (1,200 tokens) - Priority 5 (no slots left)
Result: 11,200 tokens cached, optimizing for both functionality and cost savings.
📊 Cache Analytics
The handler provides detailed logging:
💾 CACHED tools (slot 1/4) - NEW tools needed caching
⚡ CACHED content (slot 2/4, 3001 tokens) - MAINTAIN existing cache
🔄 CACHED content (slot 3/4, 2000 tokens) - REFRESH expiring cache
💾 CACHED content (slot 4/4, 1705 tokens) - NEW content block
🚫 SKIPPED ITEMS (2 items):
❌ content (priority 3, new, 1524 tokens) - smaller new content, larger cached content prioritized
❌ system (priority 5, cached, 1182 tokens) - system already cached, content got priority
📊 CACHE SUMMARY:
🎯 Slots used: 4/4
⚡ Previously cached: 2 items (50.0%)
💾 Newly cached: 2 items
🚫 Skipped: 2 items
📈 Cached tokens: 7,886 | Skipped tokens: 2,706
⚙️ Configuration
cache_handler = SmartCacheCallbackHandler(
cache_duration=300, # Cache validity in seconds (default: 5 minutes)
max_cache_blocks=4, # Max cache blocks (Anthropic limit: 4)
min_token_count=1024, # Minimum tokens to consider for caching
enable_logging=True, # Enable detailed cache logging
log_level="INFO", # Logging level
cache_dir=None, # Custom cache directory (default: temp)
)
🎯 Advanced Usage
With Tools
from langchain_core.tools import tool
@tool
def get_weather(location: str) -> str:
"""Get current weather for a location."""
return f"Weather in {location}: Sunny, 72°F"
# Tools automatically get highest priority when not cached
llm_with_tools = llm.bind_tools([get_weather])
Cache Statistics
# Access cache statistics
stats = cache_handler.get_stats()
print(f"Cache hit rate: {stats.cache_hit_rate:.1f}%")
print(f"Total tokens cached: {stats.total_tokens_cached:,}")
print(f"Estimated cost savings: ${stats.estimated_savings:.2f}")
🔧 Requirements
- Python 3.8+
- langchain-core >= 0.3.62
- langchain-anthropic >= 0.3.14
- tiktoken >= 0.8.0
Note: This package is specifically designed for Anthropic Claude models that support the
cache_controlfeature. Other providers may be added in future versions.
📈 Performance Benefits
- Cost Reduction: Up to 90% savings on repeated content
- Latency Improvement: Cached content loads ~10x faster
- Smart Prioritization: Ensures most valuable content stays cached
- Automatic Management: No manual cache invalidation needed
🤝 Contributing
Contributions are welcome! Please read our Contributing Guide for details.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Built for the LangChain ecosystem
- Optimized for Anthropic Claude models
- Inspired by modern caching strategies and cost optimization principles
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_anthropic_smart_cache-0.2.0.tar.gz.
File metadata
- Download URL: langchain_anthropic_smart_cache-0.2.0.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c600382ee544bf7cda5f8a8e95869d979527a379e62aa1cd85813ce230f8d30
|
|
| MD5 |
644568a4c11658bf810646a4d1cdc445
|
|
| BLAKE2b-256 |
c5600687a21d808a535ae37403359c5021e3ba96d7a771ccaf3a0cba002f89b9
|
File details
Details for the file langchain_anthropic_smart_cache-0.2.0-py3-none-any.whl.
File metadata
- Download URL: langchain_anthropic_smart_cache-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44865489a7aca1eca91d264d8a60aa7ca2c0ded37de24574a889692fba206a30
|
|
| MD5 |
b6eef736e7039cd7047be25162ad3703
|
|
| BLAKE2b-256 |
bc09d03d7b40259b5589316f74f4c0c4d9b87bb4bf14e7d9f5a2612d0cc8b0a4
|