Unified httpx cache (TTL/ETag) + DuckDB mirror (raw+normalized) with SQL/LLM helpers
Project description
|
|
⧊where (awhere)*: cachedxUnified HTTP caching with DuckDB mirroring and LLM helpers. |
| * ⧊where (awhere) is pronounced aware (uh-wehr). | |
cachedx 🚀
Unified HTTP caching with DuckDB mirroring and LLM helpers
cachedx provides intelligent HTTP caching with automatic database mirroring, making it easy to cache API responses and query them with SQL.
Why cachedx?
Most apps repeatedly hit REST APIs and lose visibility into response data:
# Traditional approach ❌
response = await client.get("/api/users")
users = response.json() # Data lost after processing
# With cachedx ✅
response = await cached_client.get("/api/users") # Automatically cached
users_df = client.query("SELECT * FROM users WHERE active = true") # Query with SQL!
Key Features
- 🚄 Zero-config caching - Works out of the box with sensible defaults
- 🔄 Dual storage - HTTP cache + normalized tables for fast queries
- 🧠 Auto-inference - Automatically creates schemas from JSON responses
- 🛡️ LLM-safe - Built-in SQL safety for LLM-generated queries
- ⚡ High performance - Cache hits < 1ms, powered by DuckDB
- 🏗️ Production ready - Comprehensive Pydantic validation throughout
Installation
# With pip
pip install cachedx
# With uv (recommended)
uv add cachedx
# With optional dependencies
pip install 'cachedx[pandas]' # For DataFrame support
pip install 'cachedx[dev]' # For development
Requires: Python 3.12+, DuckDB 1.0+, httpx 0.27+
Quick Start
Basic HTTP Caching
from cachedx.httpcache import CachedClient
async with CachedClient(base_url="https://api.github.com") as client:
# First call hits API and caches response
response = await client.get("/users/octocat")
# Second call returns cached data (< 1ms)
response = await client.get("/users/octocat")
# Query cached data with SQL!
users = client.query("SELECT * FROM users_octocat LIMIT 10")
print(users)
Advanced Configuration
from datetime import timedelta
from cachedx.httpcache import CachedClient, CacheConfig, CacheStrategy, EndpointConfig
config = CacheConfig(
default_ttl=timedelta(minutes=5),
enable_logging=True,
endpoints={
"/api/users": EndpointConfig(
strategy=CacheStrategy.CACHED,
ttl=timedelta(minutes=10),
table_name="users"
),
"/api/metadata": EndpointConfig(
strategy=CacheStrategy.STATIC # Cache forever
),
"/api/realtime/*": EndpointConfig(
strategy=CacheStrategy.REALTIME # Always fetch, but store
),
}
)
async with CachedClient(base_url="https://api.example.com", cache_config=config) as client:
response = await client.get("/api/users") # Cached for 10 minutes
df = client.query("SELECT name, email FROM users WHERE active = true")
Resource Mirroring with Auto-Inference
from cachedx.mirror import hybrid_cache, register, Mapping
# Option 1: Let cachedx infer the schema automatically
@hybrid_cache(resource="users", auto_register=True)
async def get_users(client):
return await client.get("/api/users")
# Option 2: Define explicit schema mapping
register("forecasts", Mapping(
table="forecasts",
columns={
"id": "$.id",
"sku": "$.sku",
"method": "$.method",
"status": "$.status",
"updated_at": "CAST(j->>'updated_at' AS TIMESTAMP)",
},
ddl="""
CREATE TABLE forecasts (
id TEXT PRIMARY KEY,
sku TEXT NOT NULL,
method TEXT,
status TEXT,
updated_at TIMESTAMP
)
"""
))
@hybrid_cache(resource="forecasts")
async def get_forecasts(client):
return await client.get("/api/forecasts")
# Use the decorated functions
await get_users(client) # Data automatically mirrored
await get_forecasts(client) # Uses explicit schema
# Query the mirrored data
from cachedx import safe_select
results = safe_select("""
SELECT sku, status, updated_at
FROM forecasts
WHERE status = 'failed'
AND updated_at > now() - INTERVAL 1 DAY
ORDER BY updated_at DESC
""")
LLM Integration
from cachedx import build_llm_context, safe_llm_query
# Build context for LLM
context = build_llm_context(include_samples=True)
print(context)
# Output:
# # Database Schema and Context
# You have access to a DuckDB database with cached API responses.
# ## Available Tables (3 tables)
# ### Table: `users`
# **Columns:**
# - `id` (BIGINT, NOT NULL)
# - `name` (TEXT, NULL)
# - `email` (TEXT, NULL)
# **Sample data:**
# | id | name | email |
# |----|----------|-----------------|
# | 1 | Alice | alice@example.com |
# Use with your favorite LLM
prompt = f"""
Generate a SQL query to find the top 10 most active users.
{context}
"""
# Execute LLM-generated queries safely
llm_sql = "SELECT name, COUNT(*) as activity FROM users GROUP BY name ORDER BY activity DESC LIMIT 10"
result = safe_llm_query(llm_sql)
if result["success"]:
print(f"Found {result['row_count']} users")
print(result["data"]) # pandas DataFrame or list of dicts
else:
print(f"Query failed: {result['error']}")
Architecture
cachedx uses a dual storage architecture:
graph LR
API[REST API] -->|JSON| CLIENT[CachedClient]
CLIENT -->|Store| CACHE[(HTTP Cache<br/>TTL + ETag)]
CLIENT -->|Mirror| TABLES[(Normalized Tables<br/>users, forecasts)]
APP[Your App] -->|SQL| QUERY[Query Engine]
QUERY --> CACHE
QUERY --> TABLES
LLM[LLM] -->|Safe SQL| QUERY
Benefits:
- HTTP Cache: Fast response serving with TTL/ETag support
- Normalized Tables: Structured data for complex queries and analytics
- LLM Safety: Prevents dangerous operations, adds automatic LIMIT
- Auto-Inference: Zero-config schema creation from JSON responses
Cache Strategies
| Strategy | Behavior | Use Case |
|---|---|---|
CACHED |
Cache with TTL, supports ETag revalidation | Most API endpoints |
STATIC |
Cache forever, never expires | Metadata, configuration |
REALTIME |
Always fetch, but store for querying | Live data, real-time feeds |
DISABLED |
No caching | Debug, testing |
Performance
| Operation | Latency | Notes |
|---|---|---|
| Cache Hit | < 1ms | Served from DuckDB |
| Cache Miss | Network + 2ms | Store + mirror overhead |
| SQL Query (1K rows) | 5-10ms | DuckDB performance |
| Auto-inference | 2-5ms | Schema creation |
Examples
The examples/ directory contains comprehensive demonstrations of cachedx functionality:
Running Examples
# Clone the repository
git clone https://github.com/awhereai/cachedx
cd cachedx
# Install dependencies
uv sync # or pip install -e '.[dev]'
# Run individual examples
uv run python -m examples.simple_cache
uv run python -m examples.quickstart
uv run python -m examples.advanced_mirroring
uv run python -m examples.llm_safety_demo
uv run python -m examples.basic_demo
Example Descriptions
🚀 basic_demo.py - Core Features Walkthrough
What it does: Demonstrates all core cachedx features in one comprehensive example Features shown:
- Automatic HTTP caching with GitHub API
- View generation from cached JSON responses
- SQL querying of cached data
- LLM context generation for query assistance
- Cache statistics and monitoring
Key takeaways: Perfect introduction to cachedx - shows HTTP caching, SQL queries, and LLM integration working together seamlessly.
⚡ simple_cache.py - Basic HTTP Caching
What it does: Minimal example showing basic HTTP caching functionality Features shown:
- Drop-in replacement for httpx.AsyncClient
- Automatic response caching and cache hits
- SQL querying of cached data
- Cache statistics
Key takeaways: Start here if you just need HTTP caching. Shows how cachedx works as a simple httpx wrapper.
📚 quickstart.py - Three-Part Comprehensive Demo
What it does: Structured walkthrough of HTTP caching, resource mirroring, and LLM helpers Features shown:
- Part 1 - HTTP Cache: Basic caching with custom configurations
- Part 2 - Mirror Demo: Automatic schema inference and data mirroring
- Part 3 - LLM Helper: Safe query execution and context generation
Key takeaways: Best overview of all three layers working together. Great for understanding the full cachedx workflow.
🔧 advanced_mirroring.py - Schema Inference & Complex Mapping
What it does: Advanced resource mirroring with custom schemas and auto-inference Features shown:
- Custom schema registration for GitHub repositories
- Automatic mirroring with
@hybrid_cachedecorator - Auto-inference handling complex JSON with nested arrays
- Advanced SQL queries on mirrored data
- LLM context generation from multiple data sources
Key takeaways: For production usage with complex APIs. Shows both manual schema definition and auto-inference working with challenging data structures.
🛡️ llm_safety_demo.py - LLM Security Features
What it does: Comprehensive demonstration of SQL safety features for LLM integration Features shown:
- Safe query execution (SELECT-only enforcement)
- Dangerous keyword blocking (prevents DROP, DELETE, etc.)
- Query validation and error handling
- Automatic LIMIT injection for unbounded queries
- Execution timing and metadata collection
Key takeaways: Essential for LLM applications. Shows how cachedx prevents SQL injection and dangerous operations while enabling powerful query capabilities.
Real-World Use Cases
We've created two complete, runnable example applications that demonstrate cachedx in production-ready scenarios. Each app includes both backend (FastAPI + cachedx) and frontend (React) with full setup instructions.
🌐 Use Case 1: Data Dashboard UI App
Complete Example App: examples/dashboard-ui/
Scenario: Building a React dashboard that displays user analytics from your company's REST API with intelligent caching, offline capability, and custom SQL query capabilities.
Key Features Demonstrated:
- ⚡ 50x faster loading (100ms vs 5+ seconds)
- 🔄 Offline capability with cached data
- 📊 Custom SQL queries from the frontend
- 🛡️ SQL injection protection with safety layers
- 🚀 Real-time updates with intelligent caching
Quick Start:
# Backend
cd examples/dashboard-ui/backend
uv sync
uv run python main.py
# Frontend (new terminal)
cd examples/dashboard-ui/frontend
npm install && npm start
Architecture Highlights:
- Different caching strategies for different data types (30min for users, 10min for analytics, realtime for live metrics)
- FastAPI endpoints with cachedx integration
- React dashboard with SQL query builder
- Automatic schema inference and data mirroring
🤖 Use Case 2: PydanticAI Support Agent
Complete Example App: examples/support-agent/
Scenario: Intelligent customer support agent using PydanticAI that accesses live company data through cachedx for accurate, context-aware responses.
Key Features Demonstrated:
- 🧠 AI agent with real-time data access
- ⚡ Sub-second responses with cached data
- 🛡️ Safe operations (query-only, no data modification)
- 📊 Rich context from multiple data sources
- 🔄 Critical data updates every 30 seconds
- 📈 Scales to thousands of concurrent users
Quick Start:
# Backend
cd examples/support-agent/backend
uv sync
export OPENAI_API_KEY="your-api-key"
uv run python main.py
# Frontend (new terminal)
cd examples/support-agent/frontend
npm install && npm start
Architecture Highlights:
- PydanticAI agent with cachedx data access tools
- Multi-API integration with smart caching (15min users, 2min orders, 30sec inventory)
- Chat interface with confidence scoring and suggested actions
- Automatic data mirroring and SQL context generation
Example Agent Conversations:
- "What's the status of my recent orders?" → Agent queries orders table with user context
- "Is the iPhone 15 Pro in stock?" → Agent checks real-time inventory with 30-second cache
- "Show me my account information" → Agent retrieves user data with appropriate caching
Development
# Clone repository
git clone https://github.com/yourusername/cachedx
cd cachedx
# Install with uv (recommended)
uv sync
uv run python examples/quickstart.py
# Or with pip
pip install -e '.[dev]'
python examples/quickstart.py
# Run tests
uv run pytest
# or
pytest
# Type checking
uv run mypy cachedx
# or
mypy cachedx
# Linting
uv run ruff check cachedx
# or
ruff check cachedx
API Reference
Core Functions
safe_select(sql, params, limit)- Execute SELECT-only queries safelybuild_llm_context()- Generate LLM context from available datasafe_llm_query(sql)- Execute LLM queries with validation and formatting
HTTP Cache Layer
CachedClient- Drop-in replacement for httpx.AsyncClient with cachingCacheConfig- Global cache configurationEndpointConfig- Per-endpoint cache settingsCacheStrategy- Caching strategies (CACHED, STATIC, REALTIME, DISABLED)
Mirror Layer
@hybrid_cache(resource)- Decorator for automatic response mirroringregister(name, mapping)- Register explicit resource mappingMapping- Schema definition for JSON -> SQL transformationinfer_from_response(data, table)- Auto-infer mapping from JSON data
FAQ
Q: Why Python 3.12+? A: Modern type hints, better performance, and improved error messages.
Q: Do I need to define schemas? A: No! Auto-inference works great for most cases. Use explicit schemas for fine control.
Q: How does this compare to Redis? A: cachedx stores structured, queryable data. Redis is for key-value. Different use cases.
Q: Is it production ready? A: Yes! Comprehensive validation, type safety, and battle-tested architecture.
Q: Can I use it with my existing httpx code?
A: Yes! CachedClient is a drop-in replacement for httpx.AsyncClient.
License
MIT License - see LICENSE file.
Contributing
Contributions welcome! Please read our contributing guidelines and submit pull requests.
Copyright © 2025 Weavers @ Eternal Loom. All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cachedx-0.2.1.tar.gz.
File metadata
- Download URL: cachedx-0.2.1.tar.gz
- Upload date:
- Size: 430.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77397a082e3947a2f8b2cdbd3485280aac12a8bfe76415d5adb31a2094783747
|
|
| MD5 |
f0be89ffe2cf35c1a879ab498a6d892a
|
|
| BLAKE2b-256 |
9e2ef8a69cf3162ff0e76a8a280123e68ea25899fc936c21c10fcb0f8530c186
|
File details
Details for the file cachedx-0.2.1-py3-none-any.whl.
File metadata
- Download URL: cachedx-0.2.1-py3-none-any.whl
- Upload date:
- Size: 41.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76b08b92f7349e76f1f97c6eb83d0a807074ed22e434626e2a269a1108dee0d2
|
|
| MD5 |
b5cea75e5068247273f6a854a17bf628
|
|
| BLAKE2b-256 |
db24ee6aceea341ecff1cf3e40252ba17fe6ac546333f9dcea94172044a724ec
|