A real-time, provenance-invalidated cognitive cache layer for AI agents and RAG.
Project description
Real-time, provenance-invalidated context for AI agents & RAG.
Build understanding once. Reuse it everywhere. Keep it fresh — automatically.
Quickstart · How it works · Bring your own stack · Benchmark · CLI
Your agent re-reads the same sources on every call — and the moment a source changes, every cached answer is silently wrong.
Coalent builds the understanding once, caches it by what the query means, and invalidates it surgically the instant an underlying source changes. As correct as re-reading everything, at a fraction of the cost — and never stale.
Why Coalent
Every context layer is forced to trade off three things. Coalent is built to hold all three at once:
- 🧠 Understanding, not chunks. It caches the decision-ready understanding your LLM produced — and retains the raw evidence with every unit, so it can never return less than plain retrieval.
- ♻️ Reuse across queries and agents. A semantic cache keyed by query meaning: ask again, or from another agent, and it's a warm hit — no re-retrieval, no re-synthesis.
- 🌿 Fresh by provenance. Every unit remembers the exact sources it used. When one changes, only the units that actually used it go stale — precisely, automatically, and lazily.
Coalent sits above retrieval — bring any retriever (vector DB, hybrid search, GraphRAG, tools, APIs). It's the freshness-and-reuse layer, not another retriever.
Install
pip install coalent # the core has zero required dependencies
Quickstart
Runs as-is — StubSynthesizer needs no API key, so you can feel the loop in ten seconds:
from coalent import SemanticCache, InMemoryRetriever, StubSynthesizer
# 1. Any retriever — a vector DB, a tool, an API. (In-memory here for the demo.)
retriever = InMemoryRetriever()
retriever.add("confluence:hr", "Leave policy: 21 days of annual leave per year.")
# 2. Build the cache. Swap StubSynthesizer for a real LLM below.
cache = SemanticCache(retriever, StubSynthesizer())
# 3. Ask. The first call builds understanding and caches it; the next is a warm hit.
result = cache.get("what is our leave policy?")
print(result.context["understanding"])
print(result.cache_hit) # False (cold) -> True on the next call
# 4. A source changed? Only the units that used it go stale — surgically.
cache.source_changed("confluence:hr", text="Leave policy: now 25 days.")
# the next matching read rebuilds just that one unit, lazily
Wire in a real model — any text-in / text-out LLM works:
from coalent import SemanticCache, LLMSynthesizer, OpenAIProvider
cache = SemanticCache(retriever, LLMSynthesizer(OpenAIProvider(), model="gpt-4o-mini"))
How it works
query ──► embed ──► semantic cache
│ hit & fresh? ──► serve cached understanding (no retrieval, no LLM)
│ miss / stale? ─┐
▼ ▼
your Retriever ──► your Synthesizer ──► Cognition unit
(vector/tool/API) (LLM or passthrough) { understanding
▲ + raw evidence
│ + provenance }
source changed ────────────┘ dirties ONLY the units that used that source
- Embed the query and look for an existing unit with similar meaning.
- Hit + fresh → return the cached understanding (no retrieval, no LLM call).
- Miss or stale → retrieve, synthesize understanding, retain the raw evidence, record provenance (the exact sources used), and cache it.
- A source changes →
source_changed(id)marks only the units whose provenance includes that id; they rebuild lazily on the next read.
Unchanged content is skipped via a content-hash compare, so a no-op change costs nothing.
Bring your own stack
Coalent owns a tiny contract and passes everything else through to your tools.
Retrievers — a ladder from one-liner to full control:
| You have… | Use |
|---|---|
| Qdrant / Chroma / pgvector | a shipped adapter (bring-your-own-client) |
| another vector DB | extend BaseVectorRetriever |
| an existing search function | FunctionRetriever |
| several sources to fuse | CompositeRetriever |
| anything else | implement Retriever (one method) |
from coalent import QdrantRetriever
retriever = QdrantRetriever(client=my_client, collection="docs", embed=my_embed)
Synthesizers — turn evidence into understanding:
LLMSynthesizer— structured, citation-grounded understanding via your LLM (OpenAI, Anthropic, or any provider). You own theinstructionandfields; Coalent owns the source / strict-JSON / citation envelope, so provenance is captured no matter what you ask for.JSONPassthroughSynthesizer— for already-structured tool/API JSON: caches it as the understanding, no LLM call.
Stores — durable and restart-safe (the invalidation graph rebuilds on startup):
from coalent import SemanticCache, SQLiteCognitionStore # stdlib, no server
from coalent import RedisCognitionStore # shared across processes / hosts
cache = SemanticCache(retriever, synthesizer, store=SQLiteCognitionStore("coalent.db"))
Any agent framework — the read API is a single call, so it drops in anywhere. Shipped helpers for graph nodes and MCP tools:
from coalent import make_cognition_node, build_mcp_tools
node = make_cognition_node(cache) # a graph node: state -> { context: fresh understanding }
tools = build_mcp_tools(cache) # expose the cache as an MCP tool
Benchmark
A real-LLM, quality-first benchmark (gpt-4o-mini, graded by an independent gpt-4o judge) on number-dense documents — answering from Coalent's understanding vs the full raw context, with a source change midway:
| System | Accuracy | Stays fresh | Context tokens / read |
|---|---|---|---|
| Full-context RAG | 100% | ✓ | 283 |
| Normal cache (raw chunks) | 86% | ✗ stale | 283 |
| Coalent | 100% | ✓ | 96 |
Coalent matches full-context RAG accuracy (independently graded), never goes stale after a source change (a normal cache does), and sends ~66% fewer context tokens — up to 75% on large documents. Cost optimization without trading away quality. (gpt-4o corroborates within ~3%; full two-model breakdown in the docs.)
CLI
Installing Coalent gives you a coalent command — a redis-cli for your cognition cache (over a SQLite store):
$ coalent ls
STATUS HITS AGE SRC ID QUERY
fresh 6 2m 2 cog:c95a9d2897e0af what is our leave policy?
dirty 1 12m 1 cog:7f1a0b9c3d2e4f remote work rules
$ coalent show cog:c95a9d2897e0af # understanding + provenance + raw evidence
$ coalent invalidate confluence:98231 # fire a change event
$ coalent stats
Documentation
📚 Full docs: coalent.ai/docs — concepts, provenance & freshness, retrievers, synthesizers, persistence, worked examples (vector search, MCP & tools, agents), and the complete get() / data-model reference.
Install options
pip install coalent # core, zero required deps
pip install "coalent[openai]" # OpenAI provider (also: anthropic)
pip install "coalent[qdrant]" # vector adapters (also: chroma, pgvector)
pip install "coalent[redis]" # distributed store
pip install "coalent[dev]" # tests + lint + types
Contributing
Issues and PRs welcome. Run the gate before pushing:
pip install -e ".[dev]"
pytest && ruff check src && mypy src
Status & license
Alpha — the API may change before 1.0. Fully typed (mypy --strict), linted, and tested.
Licensed under Apache-2.0.
Context that's trustworthy, not just cheap.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coalent-0.2.0.tar.gz.
File metadata
- Download URL: coalent-0.2.0.tar.gz
- Upload date:
- Size: 100.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9bd3481a316ad7692862b22d26f489dd2105eae5bb254da779eb97fccb25758
|
|
| MD5 |
775caf14d367e0fcc85529b627717e6a
|
|
| BLAKE2b-256 |
04ace4d7bc468510d2dce8331ebc176cb2b6cbd0e9eb60e9c968b0ad513210cc
|
File details
Details for the file coalent-0.2.0-py3-none-any.whl.
File metadata
- Download URL: coalent-0.2.0-py3-none-any.whl
- Upload date:
- Size: 47.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b38c3063051d8eacc036d1e8b1bdaa8310455e34735d60008effa142af2f3c20
|
|
| MD5 |
05a548ade26717cd755defe3445e73ac
|
|
| BLAKE2b-256 |
e334e93161213255687dc47429487c70c9a09131b845a65f5bb460741e0627bf
|