Skip to main content

Stateful RAG kernel with corpus nectar — agents know what's in the corpus before any query runs

Project description

ragmake

CI License: MIT Python 3.11+

RAG without corpus memory is blind guessing.

Standard retrieval pipelines give an agent whatever chunks scored highest for a query. The agent has no idea what the corpus actually contains — it reasons from fragments and hopes the retriever surfaced the right ones. Every run starts from zero, re-embedding everything, recomputing everything, knowing nothing.

ragmake changes the equation.

It compiles your documents into persistent state and distills a nectar — a corpus-level synthesis that tells the agent what the corpus is, not just what a query happened to surface. Nectar is compiled once and refreshed only when content changes. The agent walks into every conversation already knowing the terrain.

Compile plane and serve plane


The core idea: nectar

In a standard RAG pipeline the agent is reactive. It sees retrieved chunks and has to infer the shape of the corpus from those fragments alone. If retrieval misses something relevant, the agent never knew it existed.

ragmake adds a proactive layer. Before any query runs, the compiler reads the entire corpus and distills it into nectar: the scope, dominant concepts, recurring terminology, and document set — all in one compact, pre-built summary. Nectar is not a retrieval result. It exists independently of any query and persists across runs.

At query time ragmake assembles two things for the agent:

Layer What it is When it's built
Nectar What the corpus contains — its scope, concepts, and shape Compile time. Cached. Rebuilt only on corpus change.
Evidence What's specifically relevant to this query Query time. Retrieved from the vector store.

Together they give the agent both the map and the territory. The agent doesn't have to guess what the knowledge base covers — it already knows.


What "stateful" means

ragmake builds persistent state from your documents and reuses it. Nothing is recomputed unless content has actually changed:

  • Document hash check — unchanged documents are skipped entirely
  • Chunk embedding cache — unchanged chunks within a changed document reuse cached vectors
  • Corpus signature — nectar is rebuilt only when the set of document hashes changes

Only changed content costs anything. Everything else is free on re-ingest.

Artifact stack

This is the "make" in ragmake. Like a build system, it tracks what changed and only rebuilds those targets. The rest of the state is reused as-is.


Install

Python 3.11+ required.

pip install ragmake

Optional extras:

pip install 'ragmake[openai]'    # OpenAI-compatible embeddings and synopsis
pip install 'ragmake[azure]'     # Azure Blob + Azure AI Search
pip install 'ragmake[postgres]'  # Postgres + pgvector

Or from source:

pip install -e .

Quick start

from pathlib import Path

from stateful_rag import (
    FileArtifactStore,
    HashingEmbedder,
    HeuristicSynopsisCompiler,
    InMemoryVectorStore,
    SourceDocument,
    StatefulRAGCompiler,
    StatefulRAGRuntime,
    WordChunker,
)

# Build the compiler — this is what creates and maintains corpus state
compiler = StatefulRAGCompiler(
    artifact_store=FileArtifactStore(Path(".ragmake_state")),
    vector_store=InMemoryVectorStore(),
    embedder=HashingEmbedder(),
    synopsis_compiler=HeuristicSynopsisCompiler(),
    chunker=WordChunker(max_words=120, overlap_words=20),
)

# Ingest documents — nectar is compiled after this
compiler.ingest_documents([
    SourceDocument(
        corpus_id="support-kb",
        document_id="refund-policy.txt",
        text="Enterprise refunds are allowed within 30 days when the onboarding pack is unused.",
    ),
    SourceDocument(
        corpus_id="support-kb",
        document_id="api-access.txt",
        text="Workspace admins can rotate API keys from the admin console.",
    ),
])

# Build the runtime — this is what the agent uses at query time
runtime = StatefulRAGRuntime(
    compiler=compiler,
    vector_store=compiler.vector_store,
    embedder=compiler.embedder,
)

# The agent receives nectar (what the corpus is) + evidence (what's relevant)
context = runtime.build_context(
    "What do I need for an enterprise refund?",
    corpus_id="support-kb",
)
payload = runtime.render_prompt_payload(context)

# payload["synopsis"] — the nectar: corpus-level memory, query-independent
# payload["sources"]  — the evidence: top matching chunks for this query

Reingest the same documents unchanged — zero embedder calls, nectar unchanged, state fully reused.


With a real embedder (OpenAI)

from openai import OpenAI
from stateful_rag.adapters import OpenAICompatibleEmbedder, OpenAISynopsisCompiler, SQLiteArtifactStore, SQLiteVectorStore

client = OpenAI(api_key="...")

compiler = StatefulRAGCompiler(
    artifact_store=SQLiteArtifactStore(".ragmake/state.db"),
    vector_store=SQLiteVectorStore(".ragmake/state.db"),
    embedder=OpenAICompatibleEmbedder(client, model="text-embedding-3-large"),
    synopsis_compiler=OpenAISynopsisCompiler(client, model="gpt-4o-mini"),
)

The OpenAISynopsisCompiler uses a chat completion to write the nectar from representative corpus chunks. The result is stored and reused until the corpus changes.


What the agent prompt payload looks like

{
  "query": "What do I need for an enterprise refund?",
  "corpus_id": "support-kb",
  "content_signature": "a3f9...",
  "synopsis": "Corpus covers enterprise billing, refund eligibility, and API key management...",
  "sources": [
    {
      "document_id": "refund-policy.txt",
      "chunk_id": 0,
      "score": 0.91,
      "text": "Enterprise refunds are allowed within 30 days..."
    }
  ]
}

synopsis is the nectar. It is always present, always current, always independent of the query. The agent can orient itself before reading a single chunk.


CLI

The demo entry point ingests plain UTF-8 text files into a persistent SQLite state directory and renders a prompt payload for a query:

ragmake-demo --query "What changed in the refund policy?" docs/refund.txt docs/api.txt

Options:

  • --corpus — corpus id (default: demo)
  • --state-dir — where the SQLite file lives

The benchmark compares stateful ingest against a stateless rebuild across cold, warm, and changed-document phases:

ragmake-benchmark --documents 200 --change-fraction 0.1
ragmake-benchmark --documents 200 --change-fraction 0.1 --backend sqlite --format json

Benchmark: stateful vs stateless


Included implementations

Core interfaces — implement any of these to swap a backend:

  • ArtifactStore — read/write/iterate JSON artifacts
  • Embedder — embed a batch of texts
  • VectorStore — upsert, delete, list, search chunks
  • SynopsisCompiler — compile nectar from corpus chunks

Local defaults (zero extra dependencies):

  • FileArtifactStore — JSON files on disk, easy to inspect
  • InMemoryArtifactStore — tests and ephemeral runs
  • InMemoryVectorStore — tests and ephemeral runs
  • HashingEmbedder — deterministic fallback for local demos and tests
  • HeuristicSynopsisCompiler — LLM-free nectar for local demos and tests
  • SessionStateManager — per-session learned concepts, entities, and intents

Optional adapters:

  • OpenAICompatibleEmbedder — OpenAI or Azure OpenAI embeddings
  • OpenAISynopsisCompiler — LLM-backed nectar via chat completions
  • SQLiteArtifactStore / SQLiteVectorStore — durable local backend, one file
  • AzureBlobArtifactStore / AzureAISearchVectorStore — Azure cloud backend
  • PostgresArtifactStore / PgVectorStore — Postgres + pgvector backend

Session memory

ragmake can optionally carry per-session state into the prompt payload — learned concepts, discussed entities, recent intents, and recent decisions. This lets the agent layer short-term conversational memory on top of the long-term corpus memory (nectar).

from stateful_rag import SessionStateManager

session_manager = SessionStateManager(artifact_store)
runtime = StatefulRAGRuntime(..., session_manager=session_manager)

session_manager.record_learning(
    "customer-42",
    summary="User is handling enterprise billing questions.",
    concepts=["enterprise refunds", "invoice workflow"],
    entities=["Acme Corp"],
)

context = runtime.build_context(
    "What do I need for an enterprise refund?",
    corpus_id="support-kb",
    session_id="customer-42",
)
# payload["session"] now carries learned_concepts, discussed_entities,
# recent_intents, recent_decisions

Current limits

  • WordChunker is word-based, not tokenizer-aware.
  • HashingEmbedder and HeuristicSynopsisCompiler are for tests and local demos, not production.
  • Omitting a document from a later ingest does not delete it; the compiler only updates documents you pass in.
  • SQLiteVectorStore uses Python-side cosine search — suitable for local and moderate corpus sizes.

Reference

Full architecture, artifact model, adapter notes, persistent local walkthrough, and benchmark behavior:


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragmake-0.1.0.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragmake-0.1.0-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file ragmake-0.1.0.tar.gz.

File metadata

  • Download URL: ragmake-0.1.0.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for ragmake-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c0370855e2f42da24913dda5179511360f516ce9bd6a60b371177a9171c7a239
MD5 6c8a7f10d6549459997781fb84c2cb8c
BLAKE2b-256 36975b90dd27161cc6d567cd522d7a35fd9d013033756cf925ca906260a2f8bd

See more details on using hashes here.

File details

Details for the file ragmake-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ragmake-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for ragmake-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 183bd63fbdf1cbdd0f7988decd5e6a2f2948986ece2d6ccf625bef684e99a311
MD5 17f7bfa814b5eb888d6db07a1a3d5145
BLAKE2b-256 16671ecc923a490dbfe6a97d3e4e1b1faa7bf11ec754fbddcebff6b036374ef6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page