Production RAG pipeline — grounded retrieval, source-cited answers, Precision@k + MRR eval
Project description
axiom-rag
Production-grade Retrieval-Augmented Generation pipeline. Ingest documents, embed them, store vectors locally, retrieve semantically, and generate grounded answers — from a clean CLI or REST API.
No hallucination from prior knowledge. Every answer is bounded by what you put in. Sources are cited inline.
rag ingest ./docs
rag query "what is our refund policy?"
Python 3.11+ · Gemini API · ChromaDB · Flask · MIT
What It Does
- Ingest — chunks documents, embeds them via Gemini
gemini-embedding-001, and stores vectors in a local ChromaDB collection - Retrieve — embeds the query and finds the top-k most semantically similar chunks above a configurable similarity threshold
- Generate — passes retrieved context to Gemini 2.5 Flash with a strict grounding prompt; the model answers only from what was retrieved
The pipeline is fully local by default. No database server required. ChromaDB runs embedded. The only network calls are to the Gemini API.
Installation
pip install axiom-rag
cp .env.example .env # set GEMINI_API_KEY
Or from source:
git clone https://github.com/axiom-llc/axiom-rag.git
cd axiom-rag
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e .
# For development (includes pytest)
pip install -e ".[dev]"
Copy .env.example to .env and set your key:
cp .env.example .env
# Edit .env — set GEMINI_API_KEY at minimum
.env is loaded automatically at startup via python-dotenv.
CLI
rag ingest ./docs # ingest directory (.txt and .md)
rag ingest ./docs/policy.txt # ingest single file
rag ingest ./docs --strategy sentences # use sentence chunking
rag query "what is the cancellation window?"
rag list # list ingested documents
rag delete policy.txt # delete a document by ID
rag stats # collection statistics (JSON)
Store-only commands (list, delete, stats) do not require GEMINI_API_KEY.
REST API
python -m server.app # default: 0.0.0.0:8000
POST /ingest
{
"text": "Full document text...",
"doc_id": "policy-v2",
"metadata": {"category": "legal"},
"strategy": "fixed"
}
Response 201:
{"doc_id": "policy-v2", "chunks_stored": 14}
POST /query
{"question": "what is the cancellation window?"}
Response 200:
{
"answer": "The cancellation window is 30 days from purchase...",
"sources": ["policy-v2"],
"chunk_count": 3,
"chunks": [
{
"text": "...",
"metadata": {"doc_id": "policy-v2", "chunk_index": 4},
"score": 0.91
}
]
}
GET /documents
DELETE /documents/<doc_id>
GET /stats
Architecture
query / ingest
│
▼
cli.py / server/app.py ← entry points; no business logic
│
▼
rag/pipeline.py ← ingest() and query(); public interface
│
├─ rag/chunker.py ← fixed-size or sentence-boundary chunking
├─ rag/embedder.py ← Gemini gemini-embedding-001 (stateless)
├─ rag/store.py ← ChromaDB upsert / cosine retrieval
└─ rag/generator.py ← Gemini 2.5 Flash; context-grounded answers
rag/config.py ← frozen Config dataclass; env + override resolution
All modules are stateless. pipeline.py is the only file that calls more
than one module. Config is resolved once at startup and passed explicitly —
no globals, no module-level singletons.
Design Notes
Embedding model. This pipeline uses models/gemini-embedding-001, the
currently available Gemini embedding model via the google-genai SDK. The
default in config.py and .env.example reflects this. Use
RAG_EMBEDDING_MODEL to override if your API key has access to additional
models.
Embedding asymmetry. The Gemini embedding API distinguishes task_type:
RETRIEVAL_DOCUMENT for ingestion and RETRIEVAL_QUERY for queries. Using
the wrong type for either degrades retrieval precision measurably. Both are
set explicitly in embedder.py.
Score threshold. Retrieved chunks below the configured cosine similarity
floor (RAG_SCORE_THRESHOLD, default 0.4) are dropped before generation.
This prevents low-relevance noise from polluting the context window. Tune
down for broader recall, up for stricter precision.
Chunk overlap. Fixed-size chunking uses a configurable overlap window
(RAG_CHUNK_OVERLAP, default 64 tokens) between consecutive chunks.
Overlap preserves context at chunk boundaries at the cost of slight index
size increase.
Grounding discipline. The generation prompt instructs the model to answer
only from provided context, cite doc_id inline, and state explicitly when
context is insufficient rather than speculate. The system_prompt parameter
in generator.generate_answer() exists for domain adaptation but changing it
to permit prior-knowledge use defeats the pipeline's purpose.
pgvector swap path. store.py is the only file that references
ChromaDB. To swap in pgvector: implement upsert, query,
delete_document, list_documents, and collection_stats in a new
store_pg.py and update the single import in pipeline.py. Nothing else
changes.
Lazy API key validation. GEMINI_API_KEY is not required at config load
time. Validation fires at the entry to embedder.embed_texts() and
embedder.embed_query(). This allows store-only CLI commands to work
without a key present.
Environment loading. Both cli.py and server/app.py call
load_dotenv() at startup. A .env file in the project root is loaded
automatically — no manual export required.
Configuration
| Variable | Default | Description |
|---|---|---|
GEMINI_API_KEY |
(required for embed/generate) | Gemini API key |
RAG_CHROMA_PATH |
~/.rag/chroma |
ChromaDB persistence directory |
RAG_COLLECTION |
documents |
ChromaDB collection name |
RAG_CHUNK_SIZE |
512 |
Approximate words per chunk |
RAG_CHUNK_OVERLAP |
64 |
Overlap between consecutive chunks |
RAG_TOP_K |
5 |
Max chunks retrieved per query |
RAG_SCORE_THRESHOLD |
0.4 |
Min cosine similarity (0–1) |
RAG_EMBEDDING_MODEL |
models/gemini-embedding-001 |
Gemini embedding model |
RAG_GENERATION_MODEL |
gemini-2.5-flash |
Gemini generation model |
Tests
pytest tests/ -v --tb=short
All tests mock Gemini API calls. No live API or network access required.
tests/test_chunker.py — chunking strategies, overlap, edge cases
tests/test_store.py — score filtering, sort order, delete, list, stats
tests/test_pipeline.py — ingest/query integration, file and directory helpers
To run with coverage (requires pytest-cov):
pip install pytest-cov
pytest tests/ -v --tb=short --cov=rag --cov-report=term-missing
Evaluation
Retrieval quality is measured using Precision@k and MRR against a ground-truth dataset.
python eval/eval_retrieval.py --dataset eval/dataset.json
python eval/eval_retrieval.py --dataset eval/dataset.json --json
The eval harness requires documents to be ingested before running. The
relevant_doc_ids in eval/dataset.json must match the doc_id values
used at ingest time (i.e. the filename including extension when ingesting
via the CLI).
See eval/README.md for full usage and tuning guidance.
License
MIT — Axiom LLC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file axiom_rag-1.0.1.tar.gz.
File metadata
- Download URL: axiom_rag-1.0.1.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
247247fc77fd8f6ebea52f6a144665734a5375af8eb2466d0825808dcbe5a5b0
|
|
| MD5 |
dd27cdae6db5a5f1171e99f1adcfdcab
|
|
| BLAKE2b-256 |
ae7e057dbdced439ab55ca6cc26a95ff023defc1a6c983bcbda37d3c21c37d87
|
Provenance
The following attestation bundles were made for axiom_rag-1.0.1.tar.gz:
Publisher:
publish.yml on axiom-llc/axiom-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
axiom_rag-1.0.1.tar.gz -
Subject digest:
247247fc77fd8f6ebea52f6a144665734a5375af8eb2466d0825808dcbe5a5b0 - Sigstore transparency entry: 1102637382
- Sigstore integration time:
-
Permalink:
axiom-llc/axiom-rag@6832f5822ad29084ee7c0499fd9b9f1441d705e3 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/axiom-llc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6832f5822ad29084ee7c0499fd9b9f1441d705e3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file axiom_rag-1.0.1-py3-none-any.whl.
File metadata
- Download URL: axiom_rag-1.0.1-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7ac7b18379733c37b5449e0586cf6b2c464d65b2490c4def15cf27785062640
|
|
| MD5 |
2d8a1dea8dd896e698602de70868cfe9
|
|
| BLAKE2b-256 |
7e9dbaf8bb8d425c7430d1b6086cfcebfabd1d55db45c3c6f658b4dcc978c98a
|
Provenance
The following attestation bundles were made for axiom_rag-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on axiom-llc/axiom-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
axiom_rag-1.0.1-py3-none-any.whl -
Subject digest:
a7ac7b18379733c37b5449e0586cf6b2c464d65b2490c4def15cf27785062640 - Sigstore transparency entry: 1102637383
- Sigstore integration time:
-
Permalink:
axiom-llc/axiom-rag@6832f5822ad29084ee7c0499fd9b9f1441d705e3 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/axiom-llc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6832f5822ad29084ee7c0499fd9b9f1441d705e3 -
Trigger Event:
release
-
Statement type: