Stateful RAG kernel with corpus nectar — agents know what's in the corpus before any query runs
Project description
ragmake
RAG without corpus memory is blind guessing.
Standard retrieval pipelines give an agent whatever chunks scored highest for a query. The agent has no idea what the corpus actually contains — it reasons from fragments and hopes the retriever surfaced the right ones. Every run starts from zero, re-embedding everything, recomputing everything, knowing nothing.
ragmake changes the equation.
It compiles your documents into persistent state and distills a nectar — a corpus-level synthesis that tells the agent what the corpus is, not just what a query happened to surface. Nectar is compiled once and refreshed only when content changes. The agent walks into every conversation already knowing the terrain.
The core idea: nectar
In a standard RAG pipeline the agent is reactive. It sees retrieved chunks and has to infer the shape of the corpus from those fragments alone. If retrieval misses something relevant, the agent never knew it existed.
ragmake adds a proactive layer. Before any query runs, the compiler reads the entire corpus and distills it into nectar: the scope, dominant concepts, recurring terminology, and document set — all in one compact, pre-built summary. Nectar is not a retrieval result. It exists independently of any query and persists across runs.
At query time ragmake assembles two things for the agent:
| Layer | What it is | When it's built |
|---|---|---|
| Nectar | What the corpus contains — its scope, concepts, and shape | Compile time. Cached. Rebuilt only on corpus change. |
| Evidence | What's specifically relevant to this query | Query time. Retrieved from the vector store. |
Together they give the agent both the map and the territory. The agent doesn't have to guess what the knowledge base covers — it already knows.
What "stateful" means
ragmake builds persistent state from your documents and reuses it. Nothing is recomputed unless content has actually changed:
- Document hash check — unchanged documents are skipped entirely
- Chunk embedding cache — unchanged chunks within a changed document reuse cached vectors
- Corpus signature — nectar is rebuilt only when the set of document hashes changes
Only changed content costs anything. Everything else is free on re-ingest.
This is the "make" in ragmake. Like a build system, it tracks what changed and only rebuilds those targets. The rest of the state is reused as-is.
Install
Python 3.11+ required.
pip install ragmake
Optional extras:
pip install 'ragmake[openai]' # OpenAI-compatible embeddings and synopsis
pip install 'ragmake[azure]' # Azure Blob + Azure AI Search
pip install 'ragmake[postgres]' # Postgres + pgvector
Or from source:
pip install -e .
Quick start
from pathlib import Path
from stateful_rag import (
FileArtifactStore,
HashingEmbedder,
HeuristicSynopsisCompiler,
InMemoryVectorStore,
SourceDocument,
StatefulRAGCompiler,
StatefulRAGRuntime,
WordChunker,
)
# Build the compiler — this is what creates and maintains corpus state
compiler = StatefulRAGCompiler(
artifact_store=FileArtifactStore(Path(".ragmake_state")),
vector_store=InMemoryVectorStore(),
embedder=HashingEmbedder(),
synopsis_compiler=HeuristicSynopsisCompiler(),
chunker=WordChunker(max_words=120, overlap_words=20),
)
# Ingest documents — nectar is compiled after this
compiler.ingest_documents([
SourceDocument(
corpus_id="support-kb",
document_id="refund-policy.txt",
text="Enterprise refunds are allowed within 30 days when the onboarding pack is unused.",
),
SourceDocument(
corpus_id="support-kb",
document_id="api-access.txt",
text="Workspace admins can rotate API keys from the admin console.",
),
])
# Build the runtime — this is what the agent uses at query time
runtime = StatefulRAGRuntime(
compiler=compiler,
vector_store=compiler.vector_store,
embedder=compiler.embedder,
)
# The agent receives nectar (what the corpus is) + evidence (what's relevant)
context = runtime.build_context(
"What do I need for an enterprise refund?",
corpus_id="support-kb",
)
payload = runtime.render_prompt_payload(context)
# payload["synopsis"] — the nectar: corpus-level memory, query-independent
# payload["sources"] — the evidence: top matching chunks for this query
Reingest the same documents unchanged — zero embedder calls, nectar unchanged, state fully reused.
With a real embedder (OpenAI)
from openai import OpenAI
from stateful_rag.adapters import OpenAICompatibleEmbedder, OpenAISynopsisCompiler, SQLiteArtifactStore, SQLiteVectorStore
client = OpenAI(api_key="...")
compiler = StatefulRAGCompiler(
artifact_store=SQLiteArtifactStore(".ragmake/state.db"),
vector_store=SQLiteVectorStore(".ragmake/state.db"),
embedder=OpenAICompatibleEmbedder(client, model="text-embedding-3-large"),
synopsis_compiler=OpenAISynopsisCompiler(client, model="gpt-4o-mini"),
)
The OpenAISynopsisCompiler uses a chat completion to write the nectar from
representative corpus chunks. The result is stored and reused until the corpus
changes.
What the agent prompt payload looks like
{
"query": "What do I need for an enterprise refund?",
"corpus_id": "support-kb",
"content_signature": "a3f9...",
"synopsis": "Corpus covers enterprise billing, refund eligibility, and API key management...",
"sources": [
{
"document_id": "refund-policy.txt",
"chunk_id": 0,
"score": 0.91,
"text": "Enterprise refunds are allowed within 30 days..."
}
]
}
synopsis is the nectar. It is always present, always current, always
independent of the query. The agent can orient itself before reading a single
chunk.
CLI
The demo entry point ingests plain UTF-8 text files into a persistent SQLite state directory and renders a prompt payload for a query:
ragmake-demo --query "What changed in the refund policy?" docs/refund.txt docs/api.txt
Options:
--corpus— corpus id (default:demo)--state-dir— where the SQLite file lives
The benchmark compares stateful ingest against a stateless rebuild across cold, warm, and changed-document phases:
ragmake-benchmark --documents 200 --change-fraction 0.1
ragmake-benchmark --documents 200 --change-fraction 0.1 --backend sqlite --format json
Included implementations
Core interfaces — implement any of these to swap a backend:
ArtifactStore— read/write/iterate JSON artifactsEmbedder— embed a batch of textsVectorStore— upsert, delete, list, search chunksSynopsisCompiler— compile nectar from corpus chunks
Local defaults (zero extra dependencies):
FileArtifactStore— JSON files on disk, easy to inspectInMemoryArtifactStore— tests and ephemeral runsInMemoryVectorStore— tests and ephemeral runsHashingEmbedder— deterministic fallback for local demos and testsHeuristicSynopsisCompiler— LLM-free nectar for local demos and testsSessionStateManager— per-session learned concepts, entities, and intents
Optional adapters:
OpenAICompatibleEmbedder— OpenAI or Azure OpenAI embeddingsOpenAISynopsisCompiler— LLM-backed nectar via chat completionsSQLiteArtifactStore/SQLiteVectorStore— durable local backend, one fileAzureBlobArtifactStore/AzureAISearchVectorStore— Azure cloud backendPostgresArtifactStore/PgVectorStore— Postgres + pgvector backend
Session memory
ragmake can optionally carry per-session state into the prompt payload — learned concepts, discussed entities, recent intents, and recent decisions. This lets the agent layer short-term conversational memory on top of the long-term corpus memory (nectar).
from stateful_rag import SessionStateManager
session_manager = SessionStateManager(artifact_store)
runtime = StatefulRAGRuntime(..., session_manager=session_manager)
session_manager.record_learning(
"customer-42",
summary="User is handling enterprise billing questions.",
concepts=["enterprise refunds", "invoice workflow"],
entities=["Acme Corp"],
)
context = runtime.build_context(
"What do I need for an enterprise refund?",
corpus_id="support-kb",
session_id="customer-42",
)
# payload["session"] now carries learned_concepts, discussed_entities,
# recent_intents, recent_decisions
Current limits
WordChunkeris word-based, not tokenizer-aware.HashingEmbedderandHeuristicSynopsisCompilerare for tests and local demos, not production.- Omitting a document from a later ingest does not delete it; the compiler only updates documents you pass in.
SQLiteVectorStoreuses Python-side cosine search — suitable for local and moderate corpus sizes.
Reference
Full architecture, artifact model, adapter notes, persistent local walkthrough, and benchmark behavior:
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ragmake-0.1.0.tar.gz.
File metadata
- Download URL: ragmake-0.1.0.tar.gz
- Upload date:
- Size: 29.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0370855e2f42da24913dda5179511360f516ce9bd6a60b371177a9171c7a239
|
|
| MD5 |
6c8a7f10d6549459997781fb84c2cb8c
|
|
| BLAKE2b-256 |
36975b90dd27161cc6d567cd522d7a35fd9d013033756cf925ca906260a2f8bd
|
File details
Details for the file ragmake-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ragmake-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
183bd63fbdf1cbdd0f7988decd5e6a2f2948986ece2d6ccf625bef684e99a311
|
|
| MD5 |
17f7bfa814b5eb888d6db07a1a3d5145
|
|
| BLAKE2b-256 |
16671ecc923a490dbfe6a97d3e4e1b1faa7bf11ec754fbddcebff6b036374ef6
|