Skip to main content

Python client for CacheCore — semantic cache gateway for LLM agent workloads

Project description

cachecore

Python client for CacheCore — the LLM API caching proxy that reduces cost and latency for AI agent workloads.

CacheCore sits transparently between your application and LLM providers (OpenAI, Anthropic via OpenAI-compat, etc.) and caches responses at two levels: L1 exact-match and L2 semantic similarity. This client handles the CacheCore-specific plumbing — header injection, dependency encoding, invalidation — without replacing your LLM SDK.

Install

pip install cachecore-python
import cachecore  # the import name is 'cachecore'

Quick start

Rung 1 — zero code changes: swap base_url

Point your existing SDK at CacheCore and get L1 exact-match caching immediately. No import cachecore required.

from openai import AsyncOpenAI

oai = AsyncOpenAI(
    api_key="your-openai-key",
    base_url="https://gateway.cachecore.it/v1",  # ← only change
)

# Identical requests are now served from cache.
resp = await oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)

Rung 2 — tenant isolation (3 lines)

Add CacheCoreClient to unlock tenant-scoped namespaces, L2 semantic caching, and per-tenant metrics. Three extra lines wired into the SDK's http_client.

from cachecore import CacheCoreClient
import httpx
from openai import AsyncOpenAI

cc = CacheCoreClient(
    gateway_url="https://gateway.cachecore.it",
    tenant_jwt="ey...",  # your tenant JWT from the CacheCore dashboard
)

oai = AsyncOpenAI(
    api_key="ignored",  # gateway injects its own upstream key
    base_url="https://gateway.cachecore.it/v1",
    http_client=httpx.AsyncClient(transport=cc.transport),
)

# Requests now carry your tenant identity.
# Semantically similar prompts hit L2 cache.
resp = await oai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
)

Rung 3 — dep invalidation

Declare which data a cached response depends on. When that data changes, invalidate the dep and all stale entries are evicted automatically.

from cachecore import CacheCoreClient, Dep
import httpx
from openai import AsyncOpenAI

cc = CacheCoreClient(
    gateway_url="https://gateway.cachecore.it",
    tenant_jwt="ey...",
)

oai = AsyncOpenAI(
    api_key="ignored",
    base_url="https://gateway.cachecore.it/v1",
    http_client=httpx.AsyncClient(transport=cc.transport),
)

# Read path — declare what data this response depends on
with cc.request_context(deps=[Dep("table:products"), Dep("table:orders")]):
    resp = await oai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "List all products under $50"}],
    )

# Write path — bypass cache for the LLM call, then invalidate
with cc.request_context(bypass=True):
    resp = await oai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Confirm order created."}],
    )
await cc.invalidate("table:products")

# Invalidate multiple deps at once
await cc.invalidate_many(["table:orders", "table:products"])

Works with LangChain / LangGraph

The transport works with any SDK that accepts an httpx.AsyncClient:

from langchain_openai import ChatOpenAI
import httpx
from cachecore import CacheCoreClient, Dep

cc = CacheCoreClient(gateway_url="https://gateway.cachecore.it", tenant_jwt="ey...")

llm = ChatOpenAI(
    model="gpt-4o",
    api_key="ignored",
    base_url="https://gateway.cachecore.it/v1",
    http_async_client=httpx.AsyncClient(transport=cc.transport),
)

# Use request_context() around any ainvoke / astream call
with cc.request_context(deps=[Dep("doc:policy-42")]):
    result = await llm.ainvoke("Summarise the compliance policy")

API reference

CacheCoreClient

CacheCoreClient(
    gateway_url: str,       # "https://gateway.cachecore.it"
    tenant_jwt: str,        # tenant HS256/RS256 JWT
    timeout: float = 30.0,  # for invalidation calls
    debug: bool = False,    # log cache status per request
)
Property / Method Description
.transport httpx.AsyncBaseTransport — pass to httpx.AsyncClient(transport=...)
.request_context(deps, bypass) Context manager — sets per-request deps / bypass
await .invalidate(dep_id) Evict all entries tagged with this dep
await .invalidate_many(dep_ids) Invalidate multiple deps concurrently
await .aclose() Close HTTP clients. Also works as async with CacheCoreClient(...):

Dep / DepDeclaration

Dep("table:products")                  # simple — hash defaults to "v1"
Dep("table:products", hash="abc123")   # explicit hash for versioned deps

CacheStatus

Parsed from response headers after a proxied request:

from cachecore import CacheStatus

status = CacheStatus.from_headers(response.headers)
# status.status      → "HIT_L1" | "HIT_L1_STALE" | "HIT_L2" | "MISS" | "BYPASS" | "UNKNOWN"
# status.similarity  → float 0.0–1.0  (non-zero on L2 hits)
# status.age_seconds → int

Exceptions

Exception When
CacheCoreError Base class for all CacheCore errors
CacheCoreAuthError 401 / 403 from the gateway
CacheCoreRateLimitError 429 — check .retry_after attribute (seconds, or None)

How it works

The client injects headers at the httpx transport layer — below the LLM SDK, above the network. Your SDK continues to work exactly as before:

Your code  →  openai SDK  →  httpx  →  [CacheCoreTransport]  →  CacheCore proxy  →  OpenAI API
                                              ↑
                                  injects X-CacheCore-Token
                                  injects X-CacheCore-Deps

Requirements

  • Python 3.10+
  • httpx >= 0.25.0

Links

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachecore_python-0.1.0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cachecore_python-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file cachecore_python-0.1.0.tar.gz.

File metadata

  • Download URL: cachecore_python-0.1.0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for cachecore_python-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f4cc9be443ad2a74c2f6b7b73dfdd6fce3eb8c4d6133688336f712b2473c9e6a
MD5 fe3e9861253f14d64e240e17595b4a16
BLAKE2b-256 3e4f3cec0355ac9f137d54d68e381a256db92d94ad6c57052289dd39abfe1d9e

See more details on using hashes here.

File details

Details for the file cachecore_python-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cachecore_python-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98b5e79f04a8b5a07e19786015673cd2ea3aa3fb61cbdda997c20bee8cb0babd
MD5 88267c807a04e3b571fbddaa2e42eea1
BLAKE2b-256 690413e6c50f309c1a90386668bd74486b90403faeb6641c1505c6bc374e0573

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page