Skip to main content

Unified httpx cache (TTL/ETag) + DuckDB mirror (raw+normalized) with SQL/LLM helpers

Project description

⧊where Logo

⧊where (awhere)*: cachedx

Unified HTTP caching with DuckDB mirroring and LLM helpers.

* ⧊where (awhere) is pronounced aware (uh-wehr).

Python 3.12+ License: MIT CI codecov

cachedx 🚀

Unified HTTP caching with DuckDB mirroring and LLM helpers

cachedx provides intelligent HTTP caching with automatic database mirroring, making it easy to cache API responses and query them with SQL.

Why cachedx?

Most apps repeatedly hit REST APIs and lose visibility into response data:

# Traditional approach ❌
response = await client.get("/api/users")
users = response.json()  # Data lost after processing

# With cachedx ✅
response = await cached_client.get("/api/users")  # Automatically cached
users_df = client.query("SELECT * FROM users WHERE active = true")  # Query with SQL!

Key Features

  • 🚄 Zero-config caching - Works out of the box with sensible defaults
  • 🔄 Dual storage - HTTP cache + normalized tables for fast queries
  • 🧠 Auto-inference - Automatically creates schemas from JSON responses
  • 🛡️ LLM-safe - Built-in SQL safety for LLM-generated queries
  • High performance - Cache hits < 1ms, powered by DuckDB
  • 🏗️ Production ready - Comprehensive Pydantic validation throughout

Installation

# With pip
pip install cachedx

# With uv (recommended)
uv add cachedx

# With optional dependencies
pip install 'cachedx[pandas]'  # For DataFrame support
pip install 'cachedx[dev]'     # For development

Requires: Python 3.12+, DuckDB 1.0+, httpx 0.27+

Quick Start

Basic HTTP Caching

from cachedx.httpcache import CachedClient

async with CachedClient(base_url="https://api.github.com") as client:
    # First call hits API and caches response
    response = await client.get("/users/octocat")

    # Second call returns cached data (< 1ms)
    response = await client.get("/users/octocat")

    # Query cached data with SQL!
    users = client.query("SELECT * FROM users_octocat LIMIT 10")
    print(users)

Advanced Configuration

from datetime import timedelta
from cachedx.httpcache import CachedClient, CacheConfig, CacheStrategy, EndpointConfig

config = CacheConfig(
    default_ttl=timedelta(minutes=5),
    enable_logging=True,
    endpoints={
        "/api/users": EndpointConfig(
            strategy=CacheStrategy.CACHED,
            ttl=timedelta(minutes=10),
            table_name="users"
        ),
        "/api/metadata": EndpointConfig(
            strategy=CacheStrategy.STATIC  # Cache forever
        ),
        "/api/realtime/*": EndpointConfig(
            strategy=CacheStrategy.REALTIME  # Always fetch, but store
        ),
    }
)

async with CachedClient(base_url="https://api.example.com", cache_config=config) as client:
    response = await client.get("/api/users")  # Cached for 10 minutes
    df = client.query("SELECT name, email FROM users WHERE active = true")

Resource Mirroring with Auto-Inference

from cachedx.mirror import hybrid_cache, register, Mapping

# Option 1: Let cachedx infer the schema automatically
@hybrid_cache(resource="users", auto_register=True)
async def get_users(client):
    return await client.get("/api/users")

# Option 2: Define explicit schema mapping
register("forecasts", Mapping(
    table="forecasts",
    columns={
        "id": "$.id",
        "sku": "$.sku",
        "method": "$.method",
        "status": "$.status",
        "updated_at": "CAST(j->>'updated_at' AS TIMESTAMP)",
    },
    ddl="""
    CREATE TABLE forecasts (
        id TEXT PRIMARY KEY,
        sku TEXT NOT NULL,
        method TEXT,
        status TEXT,
        updated_at TIMESTAMP
    )
    """
))

@hybrid_cache(resource="forecasts")
async def get_forecasts(client):
    return await client.get("/api/forecasts")

# Use the decorated functions
await get_users(client)      # Data automatically mirrored
await get_forecasts(client)  # Uses explicit schema

# Query the mirrored data
from cachedx import safe_select
results = safe_select("""
    SELECT sku, status, updated_at
    FROM forecasts
    WHERE status = 'failed'
      AND updated_at > now() - INTERVAL 1 DAY
    ORDER BY updated_at DESC
""")

LLM Integration

from cachedx import build_llm_context, safe_llm_query

# Build context for LLM
context = build_llm_context(include_samples=True)
print(context)
# Output:
# # Database Schema and Context
# You have access to a DuckDB database with cached API responses.
# ## Available Tables (3 tables)
# ### Table: `users`
# **Columns:**
# - `id` (BIGINT, NOT NULL)
# - `name` (TEXT, NULL)
# - `email` (TEXT, NULL)
# **Sample data:**
# | id | name     | email           |
# |----|----------|-----------------|
# | 1  | Alice    | alice@example.com |

# Use with your favorite LLM
prompt = f"""
Generate a SQL query to find the top 10 most active users.

{context}
"""

# Execute LLM-generated queries safely
llm_sql = "SELECT name, COUNT(*) as activity FROM users GROUP BY name ORDER BY activity DESC LIMIT 10"
result = safe_llm_query(llm_sql)

if result["success"]:
    print(f"Found {result['row_count']} users")
    print(result["data"])  # pandas DataFrame or list of dicts
else:
    print(f"Query failed: {result['error']}")

Architecture

cachedx uses a dual storage architecture:

graph LR
    API[REST API] -->|JSON| CLIENT[CachedClient]

    CLIENT -->|Store| CACHE[(HTTP Cache<br/>TTL + ETag)]
    CLIENT -->|Mirror| TABLES[(Normalized Tables<br/>users, forecasts)]

    APP[Your App] -->|SQL| QUERY[Query Engine]
    QUERY --> CACHE
    QUERY --> TABLES

    LLM[LLM] -->|Safe SQL| QUERY

Benefits:

  • HTTP Cache: Fast response serving with TTL/ETag support
  • Normalized Tables: Structured data for complex queries and analytics
  • LLM Safety: Prevents dangerous operations, adds automatic LIMIT
  • Auto-Inference: Zero-config schema creation from JSON responses

Cache Strategies

Strategy Behavior Use Case
CACHED Cache with TTL, supports ETag revalidation Most API endpoints
STATIC Cache forever, never expires Metadata, configuration
REALTIME Always fetch, but store for querying Live data, real-time feeds
DISABLED No caching Debug, testing

Performance

Operation Latency Notes
Cache Hit < 1ms Served from DuckDB
Cache Miss Network + 2ms Store + mirror overhead
SQL Query (1K rows) 5-10ms DuckDB performance
Auto-inference 2-5ms Schema creation

Examples

The examples/ directory contains comprehensive demonstrations of cachedx functionality:

Running Examples

# Clone the repository
git clone https://github.com/awhereai/cachedx
cd cachedx

# Install dependencies
uv sync  # or pip install -e '.[dev]'

# Run individual examples
uv run python -m examples.simple_cache
uv run python -m examples.quickstart
uv run python -m examples.advanced_mirroring
uv run python -m examples.llm_safety_demo
uv run python -m examples.basic_demo

Example Descriptions

🚀 basic_demo.py - Core Features Walkthrough

What it does: Demonstrates all core cachedx features in one comprehensive example Features shown:

  • Automatic HTTP caching with GitHub API
  • View generation from cached JSON responses
  • SQL querying of cached data
  • LLM context generation for query assistance
  • Cache statistics and monitoring

Key takeaways: Perfect introduction to cachedx - shows HTTP caching, SQL queries, and LLM integration working together seamlessly.


simple_cache.py - Basic HTTP Caching

What it does: Minimal example showing basic HTTP caching functionality Features shown:

  • Drop-in replacement for httpx.AsyncClient
  • Automatic response caching and cache hits
  • SQL querying of cached data
  • Cache statistics

Key takeaways: Start here if you just need HTTP caching. Shows how cachedx works as a simple httpx wrapper.


📚 quickstart.py - Three-Part Comprehensive Demo

What it does: Structured walkthrough of HTTP caching, resource mirroring, and LLM helpers Features shown:

  • Part 1 - HTTP Cache: Basic caching with custom configurations
  • Part 2 - Mirror Demo: Automatic schema inference and data mirroring
  • Part 3 - LLM Helper: Safe query execution and context generation

Key takeaways: Best overview of all three layers working together. Great for understanding the full cachedx workflow.


🔧 advanced_mirroring.py - Schema Inference & Complex Mapping

What it does: Advanced resource mirroring with custom schemas and auto-inference Features shown:

  • Custom schema registration for GitHub repositories
  • Automatic mirroring with @hybrid_cache decorator
  • Auto-inference handling complex JSON with nested arrays
  • Advanced SQL queries on mirrored data
  • LLM context generation from multiple data sources

Key takeaways: For production usage with complex APIs. Shows both manual schema definition and auto-inference working with challenging data structures.


🛡️ llm_safety_demo.py - LLM Security Features

What it does: Comprehensive demonstration of SQL safety features for LLM integration Features shown:

  • Safe query execution (SELECT-only enforcement)
  • Dangerous keyword blocking (prevents DROP, DELETE, etc.)
  • Query validation and error handling
  • Automatic LIMIT injection for unbounded queries
  • Execution timing and metadata collection

Key takeaways: Essential for LLM applications. Shows how cachedx prevents SQL injection and dangerous operations while enabling powerful query capabilities.

Real-World Use Cases

We've created two complete, runnable example applications that demonstrate cachedx in production-ready scenarios. Each app includes both backend (FastAPI + cachedx) and frontend (React) with full setup instructions.

🌐 Use Case 1: Data Dashboard UI App

Complete Example App: examples/dashboard-ui/

Scenario: Building a React dashboard that displays user analytics from your company's REST API with intelligent caching, offline capability, and custom SQL query capabilities.

Key Features Demonstrated:

  • ⚡ 50x faster loading (100ms vs 5+ seconds)
  • 🔄 Offline capability with cached data
  • 📊 Custom SQL queries from the frontend
  • 🛡️ SQL injection protection with safety layers
  • 🚀 Real-time updates with intelligent caching

Quick Start:

# Backend
cd examples/dashboard-ui/backend
uv sync
uv run python main.py

# Frontend (new terminal)
cd examples/dashboard-ui/frontend
npm install && npm start

Architecture Highlights:

  • Different caching strategies for different data types (30min for users, 10min for analytics, realtime for live metrics)
  • FastAPI endpoints with cachedx integration
  • React dashboard with SQL query builder
  • Automatic schema inference and data mirroring

🤖 Use Case 2: PydanticAI Support Agent

Complete Example App: examples/support-agent/

Scenario: Intelligent customer support agent using PydanticAI that accesses live company data through cachedx for accurate, context-aware responses.

Key Features Demonstrated:

  • 🧠 AI agent with real-time data access
  • ⚡ Sub-second responses with cached data
  • 🛡️ Safe operations (query-only, no data modification)
  • 📊 Rich context from multiple data sources
  • 🔄 Critical data updates every 30 seconds
  • 📈 Scales to thousands of concurrent users

Quick Start:

# Backend
cd examples/support-agent/backend
uv sync
export OPENAI_API_KEY="your-api-key"
uv run python main.py

# Frontend (new terminal)
cd examples/support-agent/frontend
npm install && npm start

Architecture Highlights:

  • PydanticAI agent with cachedx data access tools
  • Multi-API integration with smart caching (15min users, 2min orders, 30sec inventory)
  • Chat interface with confidence scoring and suggested actions
  • Automatic data mirroring and SQL context generation

Example Agent Conversations:

  • "What's the status of my recent orders?" → Agent queries orders table with user context
  • "Is the iPhone 15 Pro in stock?" → Agent checks real-time inventory with 30-second cache
  • "Show me my account information" → Agent retrieves user data with appropriate caching

Development

# Clone repository
git clone https://github.com/yourusername/cachedx
cd cachedx

# Install with uv (recommended)
uv sync
uv run python examples/quickstart.py

# Or with pip
pip install -e '.[dev]'
python examples/quickstart.py

# Run tests
uv run pytest
# or
pytest

# Type checking
uv run mypy cachedx
# or
mypy cachedx

# Linting
uv run ruff check cachedx
# or
ruff check cachedx

API Reference

Core Functions

  • safe_select(sql, params, limit) - Execute SELECT-only queries safely
  • build_llm_context() - Generate LLM context from available data
  • safe_llm_query(sql) - Execute LLM queries with validation and formatting

HTTP Cache Layer

  • CachedClient - Drop-in replacement for httpx.AsyncClient with caching
  • CacheConfig - Global cache configuration
  • EndpointConfig - Per-endpoint cache settings
  • CacheStrategy - Caching strategies (CACHED, STATIC, REALTIME, DISABLED)

Mirror Layer

  • @hybrid_cache(resource) - Decorator for automatic response mirroring
  • register(name, mapping) - Register explicit resource mapping
  • Mapping - Schema definition for JSON -> SQL transformation
  • infer_from_response(data, table) - Auto-infer mapping from JSON data

FAQ

Q: Why Python 3.12+? A: Modern type hints, better performance, and improved error messages.

Q: Do I need to define schemas? A: No! Auto-inference works great for most cases. Use explicit schemas for fine control.

Q: How does this compare to Redis? A: cachedx stores structured, queryable data. Redis is for key-value. Different use cases.

Q: Is it production ready? A: Yes! Comprehensive validation, type safety, and battle-tested architecture.

Q: Can I use it with my existing httpx code? A: Yes! CachedClient is a drop-in replacement for httpx.AsyncClient.

License

MIT License - see LICENSE file.

Contributing

Contributions welcome! Please read our contributing guidelines and submit pull requests.


Copyright © 2025 Weavers @ Eternal Loom. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachedx-0.2.1.tar.gz (430.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachedx-0.2.1-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file cachedx-0.2.1.tar.gz.

File metadata

  • Download URL: cachedx-0.2.1.tar.gz
  • Upload date:
  • Size: 430.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for cachedx-0.2.1.tar.gz
Algorithm Hash digest
SHA256 77397a082e3947a2f8b2cdbd3485280aac12a8bfe76415d5adb31a2094783747
MD5 f0be89ffe2cf35c1a879ab498a6d892a
BLAKE2b-256 9e2ef8a69cf3162ff0e76a8a280123e68ea25899fc936c21c10fcb0f8530c186

See more details on using hashes here.

File details

Details for the file cachedx-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: cachedx-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for cachedx-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 76b08b92f7349e76f1f97c6eb83d0a807074ed22e434626e2a269a1108dee0d2
MD5 b5cea75e5068247273f6a854a17bf628
BLAKE2b-256 db24ee6aceea341ecff1cf3e40252ba17fe6ac546333f9dcea94172044a724ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page