Skip to main content

Production RAG pipeline — grounded retrieval, source-cited answers, Precision@k + MRR eval

Project description

axiom-rag

CI

Production-grade Retrieval-Augmented Generation pipeline. Ingest documents, embed them, store vectors locally, retrieve semantically, and generate grounded answers — from a clean CLI or REST API.

No hallucination from prior knowledge. Every answer is bounded by what you put in. Sources are cited inline.

rag ingest ./docs
rag query "what is our refund policy?"

Python 3.11+ · Gemini API · ChromaDB · Flask · MIT


What It Does

  1. Ingest — chunks documents, embeds them via Gemini gemini-embedding-001, and stores vectors in a local ChromaDB collection
  2. Retrieve — embeds the query and finds the top-k most semantically similar chunks above a configurable similarity threshold
  3. Generate — passes retrieved context to Gemini 2.5 Flash with a strict grounding prompt; the model answers only from what was retrieved

The pipeline is fully local by default. No database server required. ChromaDB runs embedded. The only network calls are to the Gemini API.


Installation

pip install axiom-rag
cp .env.example .env  # set GEMINI_API_KEY

Or from source:

git clone https://github.com/axiom-llc/axiom-rag.git
cd axiom-rag
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e .

# For development (includes pytest)
pip install -e ".[dev]"

Copy .env.example to .env and set your key:

cp .env.example .env
# Edit .env — set GEMINI_API_KEY at minimum

.env is loaded automatically at startup via python-dotenv.


CLI

rag ingest ./docs                         # ingest directory (.txt and .md)
rag ingest ./docs/policy.txt              # ingest single file
rag ingest ./docs --strategy sentences    # use sentence chunking

rag query "what is the cancellation window?"

rag list                                  # list ingested documents
rag delete policy.txt                     # delete a document by ID
rag stats                                 # collection statistics (JSON)

Store-only commands (list, delete, stats) do not require GEMINI_API_KEY.


REST API

python -m server.app   # default: 0.0.0.0:8000

POST /ingest

{
  "text": "Full document text...",
  "doc_id": "policy-v2",
  "metadata": {"category": "legal"},
  "strategy": "fixed"
}

Response 201:

{"doc_id": "policy-v2", "chunks_stored": 14}

POST /query

{"question": "what is the cancellation window?"}

Response 200:

{
  "answer": "The cancellation window is 30 days from purchase...",
  "sources": ["policy-v2"],
  "chunk_count": 3,
  "chunks": [
    {
      "text": "...",
      "metadata": {"doc_id": "policy-v2", "chunk_index": 4},
      "score": 0.91
    }
  ]
}

GET /documents

DELETE /documents/<doc_id>

GET /stats


Architecture

query / ingest
      │
      ▼
  cli.py / server/app.py        ← entry points; no business logic
      │
      ▼
  rag/pipeline.py               ← ingest() and query(); public interface
      │
      ├─ rag/chunker.py         ← fixed-size or sentence-boundary chunking
      ├─ rag/embedder.py        ← Gemini gemini-embedding-001 (stateless)
      ├─ rag/store.py           ← ChromaDB upsert / cosine retrieval
      └─ rag/generator.py       ← Gemini 2.5 Flash; context-grounded answers

  rag/config.py                 ← frozen Config dataclass; env + override resolution

All modules are stateless. pipeline.py is the only file that calls more than one module. Config is resolved once at startup and passed explicitly — no globals, no module-level singletons.


Design Notes

Embedding model. This pipeline uses models/gemini-embedding-001, the currently available Gemini embedding model via the google-genai SDK. The default in config.py and .env.example reflects this. Use RAG_EMBEDDING_MODEL to override if your API key has access to additional models.

Embedding asymmetry. The Gemini embedding API distinguishes task_type: RETRIEVAL_DOCUMENT for ingestion and RETRIEVAL_QUERY for queries. Using the wrong type for either degrades retrieval precision measurably. Both are set explicitly in embedder.py.

Score threshold. Retrieved chunks below the configured cosine similarity floor (RAG_SCORE_THRESHOLD, default 0.4) are dropped before generation. This prevents low-relevance noise from polluting the context window. Tune down for broader recall, up for stricter precision.

Chunk overlap. Fixed-size chunking uses a configurable overlap window (RAG_CHUNK_OVERLAP, default 64 tokens) between consecutive chunks. Overlap preserves context at chunk boundaries at the cost of slight index size increase.

Grounding discipline. The generation prompt instructs the model to answer only from provided context, cite doc_id inline, and state explicitly when context is insufficient rather than speculate. The system_prompt parameter in generator.generate_answer() exists for domain adaptation but changing it to permit prior-knowledge use defeats the pipeline's purpose.

pgvector swap path. store.py is the only file that references ChromaDB. To swap in pgvector: implement upsert, query, delete_document, list_documents, and collection_stats in a new store_pg.py and update the single import in pipeline.py. Nothing else changes.

Lazy API key validation. GEMINI_API_KEY is not required at config load time. Validation fires at the entry to embedder.embed_texts() and embedder.embed_query(). This allows store-only CLI commands to work without a key present.

Environment loading. Both cli.py and server/app.py call load_dotenv() at startup. A .env file in the project root is loaded automatically — no manual export required.


Configuration

Variable Default Description
GEMINI_API_KEY (required for embed/generate) Gemini API key
RAG_CHROMA_PATH ~/.rag/chroma ChromaDB persistence directory
RAG_COLLECTION documents ChromaDB collection name
RAG_CHUNK_SIZE 512 Approximate words per chunk
RAG_CHUNK_OVERLAP 64 Overlap between consecutive chunks
RAG_TOP_K 5 Max chunks retrieved per query
RAG_SCORE_THRESHOLD 0.4 Min cosine similarity (0–1)
RAG_EMBEDDING_MODEL models/gemini-embedding-001 Gemini embedding model
RAG_GENERATION_MODEL gemini-2.5-flash Gemini generation model

Tests

pytest tests/ -v --tb=short

All tests mock Gemini API calls. No live API or network access required.

tests/test_chunker.py    — chunking strategies, overlap, edge cases
tests/test_store.py      — score filtering, sort order, delete, list, stats
tests/test_pipeline.py   — ingest/query integration, file and directory helpers

To run with coverage (requires pytest-cov):

pip install pytest-cov
pytest tests/ -v --tb=short --cov=rag --cov-report=term-missing

Evaluation

Retrieval quality is measured using Precision@k and MRR against a ground-truth dataset.

python eval/eval_retrieval.py --dataset eval/dataset.json
python eval/eval_retrieval.py --dataset eval/dataset.json --json

The eval harness requires documents to be ingested before running. The relevant_doc_ids in eval/dataset.json must match the doc_id values used at ingest time (i.e. the filename including extension when ingesting via the CLI).

See eval/README.md for full usage and tuning guidance.


License

MIT — Axiom LLC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axiom_rag-1.0.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

axiom_rag-1.0.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file axiom_rag-1.0.1.tar.gz.

File metadata

  • Download URL: axiom_rag-1.0.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for axiom_rag-1.0.1.tar.gz
Algorithm Hash digest
SHA256 247247fc77fd8f6ebea52f6a144665734a5375af8eb2466d0825808dcbe5a5b0
MD5 dd27cdae6db5a5f1171e99f1adcfdcab
BLAKE2b-256 ae7e057dbdced439ab55ca6cc26a95ff023defc1a6c983bcbda37d3c21c37d87

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiom_rag-1.0.1.tar.gz:

Publisher: publish.yml on axiom-llc/axiom-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file axiom_rag-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: axiom_rag-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for axiom_rag-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a7ac7b18379733c37b5449e0586cf6b2c464d65b2490c4def15cf27785062640
MD5 2d8a1dea8dd896e698602de70868cfe9
BLAKE2b-256 7e9dbaf8bb8d425c7430d1b6086cfcebfabd1d55db45c3c6f658b4dcc978c98a

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiom_rag-1.0.1-py3-none-any.whl:

Publisher: publish.yml on axiom-llc/axiom-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page