Production RAG pipeline — grounded retrieval, source-cited answers, Precision@k + MRR eval

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

axiom-llc

These details have not been verified by PyPI

Project links

Homepage

Project description

axiom-rag

Production-grade Retrieval-Augmented Generation pipeline. Ingest documents, embed them, store vectors locally, retrieve semantically, and generate grounded answers — from a clean CLI or REST API.

No hallucination from prior knowledge. Every answer is bounded by what you put in. Sources are cited inline.

rag ingest ./docs
rag query "what is our refund policy?"

Python 3.11+ · Gemini API · ChromaDB · Flask · MIT

What It Does

Ingest — chunks documents, embeds them via Gemini gemini-embedding-001, and stores vectors in a local ChromaDB collection
Retrieve — embeds the query and finds the top-k most semantically similar chunks above a configurable similarity threshold
Generate — passes retrieved context to Gemini 2.5 Flash with a strict grounding prompt; the model answers only from what was retrieved

The pipeline is fully local by default. No database server required. ChromaDB runs embedded. The only network calls are to the Gemini API.

Installation

pip install axiom-rag
cp .env.example .env  # set GEMINI_API_KEY

Or from source:

git clone https://github.com/axiom-llc/axiom-rag.git
cd axiom-rag
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e .

# For development (includes pytest)
pip install -e ".[dev]"

Copy .env.example to .env and set your key:

cp .env.example .env
# Edit .env — set GEMINI_API_KEY at minimum

.env is loaded automatically at startup via python-dotenv.

CLI

rag ingest ./docs                         # ingest directory (.txt and .md)
rag ingest ./docs/policy.txt              # ingest single file
rag ingest ./docs --strategy sentences    # use sentence chunking

rag query "what is the cancellation window?"

rag list                                  # list ingested documents
rag delete policy.txt                     # delete a document by ID
rag stats                                 # collection statistics (JSON)

Store-only commands (list, delete, stats) do not require GEMINI_API_KEY.

REST API

python -m server.app   # default: 0.0.0.0:8000

`POST /ingest`

{
  "text": "Full document text...",
  "doc_id": "policy-v2",
  "metadata": {"category": "legal"},
  "strategy": "fixed"
}

Response 201:

{"doc_id": "policy-v2", "chunks_stored": 14}

`POST /query`

{"question": "what is the cancellation window?"}

Response 200:

{
  "answer": "The cancellation window is 30 days from purchase...",
  "sources": ["policy-v2"],
  "chunk_count": 3,
  "chunks": [
    {
      "text": "...",
      "metadata": {"doc_id": "policy-v2", "chunk_index": 4},
      "score": 0.91
    }
  ]
}

`GET /documents`

`DELETE /documents/<doc_id>`

`GET /stats`

Architecture

query / ingest
      │
      ▼
  cli.py / server/app.py        ← entry points; no business logic
      │
      ▼
  rag/pipeline.py               ← ingest() and query(); public interface
      │
      ├─ rag/chunker.py         ← fixed-size or sentence-boundary chunking
      ├─ rag/embedder.py        ← Gemini gemini-embedding-001 (stateless)
      ├─ rag/store.py           ← ChromaDB upsert / cosine retrieval
      └─ rag/generator.py       ← Gemini 2.5 Flash; context-grounded answers

  rag/config.py                 ← frozen Config dataclass; env + override resolution

All modules are stateless. pipeline.py is the only file that calls more than one module. Config is resolved once at startup and passed explicitly — no globals, no module-level singletons.

Design Notes

Embedding model. This pipeline uses models/gemini-embedding-001, the currently available Gemini embedding model via the google-genai SDK. The default in config.py and .env.example reflects this. Use RAG_EMBEDDING_MODEL to override if your API key has access to additional models.

Embedding asymmetry. The Gemini embedding API distinguishes task_type: RETRIEVAL_DOCUMENT for ingestion and RETRIEVAL_QUERY for queries. Using the wrong type for either degrades retrieval precision measurably. Both are set explicitly in embedder.py.

Score threshold. Retrieved chunks below the configured cosine similarity floor (RAG_SCORE_THRESHOLD, default 0.4) are dropped before generation. This prevents low-relevance noise from polluting the context window. Tune down for broader recall, up for stricter precision.

Chunk overlap. Fixed-size chunking uses a configurable overlap window (RAG_CHUNK_OVERLAP, default 64 tokens) between consecutive chunks. Overlap preserves context at chunk boundaries at the cost of slight index size increase.

Grounding discipline. The generation prompt instructs the model to answer only from provided context, cite doc_id inline, and state explicitly when context is insufficient rather than speculate. The system_prompt parameter in generator.generate_answer() exists for domain adaptation but changing it to permit prior-knowledge use defeats the pipeline's purpose.

pgvector swap path. store.py is the only file that references ChromaDB. To swap in pgvector: implement upsert, query, delete_document, list_documents, and collection_stats in a new store_pg.py and update the single import in pipeline.py. Nothing else changes.

Lazy API key validation. GEMINI_API_KEY is not required at config load time. Validation fires at the entry to embedder.embed_texts() and embedder.embed_query(). This allows store-only CLI commands to work without a key present.

Environment loading. Both cli.py and server/app.py call load_dotenv() at startup. A .env file in the project root is loaded automatically — no manual export required.

Configuration

Variable	Default	Description
`GEMINI_API_KEY`	(required for embed/generate)	Gemini API key
`RAG_CHROMA_PATH`	`~/.rag/chroma`	ChromaDB persistence directory
`RAG_COLLECTION`	`documents`	ChromaDB collection name
`RAG_CHUNK_SIZE`	`512`	Approximate words per chunk
`RAG_CHUNK_OVERLAP`	`64`	Overlap between consecutive chunks
`RAG_TOP_K`	`5`	Max chunks retrieved per query
`RAG_SCORE_THRESHOLD`	`0.4`	Min cosine similarity (0–1)
`RAG_EMBEDDING_MODEL`	`models/gemini-embedding-001`	Gemini embedding model
`RAG_GENERATION_MODEL`	`gemini-2.5-flash`	Gemini generation model

Tests

pytest tests/ -v --tb=short

All tests mock Gemini API calls. No live API or network access required.

tests/test_chunker.py    — chunking strategies, overlap, edge cases
tests/test_store.py      — score filtering, sort order, delete, list, stats
tests/test_pipeline.py   — ingest/query integration, file and directory helpers

To run with coverage (requires pytest-cov):

pip install pytest-cov
pytest tests/ -v --tb=short --cov=rag --cov-report=term-missing

Evaluation

Retrieval quality is measured using Precision@k and MRR against a ground-truth dataset.

python eval/eval_retrieval.py --dataset eval/dataset.json
python eval/eval_retrieval.py --dataset eval/dataset.json --json

The eval harness requires documents to be ingested before running. The relevant_doc_ids in eval/dataset.json must match the doc_id values used at ingest time (i.e. the filename including extension when ingesting via the CLI).

See eval/README.md for full usage and tuning guidance.

License

MIT — Axiom LLC

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

axiom-llc

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Mar 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axiom_rag-1.0.1.tar.gz (15.8 kB view details)

Uploaded Mar 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

axiom_rag-1.0.1-py3-none-any.whl (15.5 kB view details)

Uploaded Mar 14, 2026 Python 3

File details

Details for the file axiom_rag-1.0.1.tar.gz.

File metadata

Download URL: axiom_rag-1.0.1.tar.gz
Upload date: Mar 14, 2026
Size: 15.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for axiom_rag-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`247247fc77fd8f6ebea52f6a144665734a5375af8eb2466d0825808dcbe5a5b0`
MD5	`dd27cdae6db5a5f1171e99f1adcfdcab`
BLAKE2b-256	`ae7e057dbdced439ab55ca6cc26a95ff023defc1a6c983bcbda37d3c21c37d87`

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiom_rag-1.0.1.tar.gz:

Publisher: publish.yml on axiom-llc/axiom-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: axiom_rag-1.0.1.tar.gz
- Subject digest: 247247fc77fd8f6ebea52f6a144665734a5375af8eb2466d0825808dcbe5a5b0
- Sigstore transparency entry: 1102637382
- Sigstore integration time: Mar 14, 2026
Source repository:
- Permalink: axiom-llc/axiom-rag@6832f5822ad29084ee7c0499fd9b9f1441d705e3
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/axiom-llc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6832f5822ad29084ee7c0499fd9b9f1441d705e3
- Trigger Event: release

File details

Details for the file axiom_rag-1.0.1-py3-none-any.whl.

File metadata

Download URL: axiom_rag-1.0.1-py3-none-any.whl
Upload date: Mar 14, 2026
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for axiom_rag-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a7ac7b18379733c37b5449e0586cf6b2c464d65b2490c4def15cf27785062640`
MD5	`2d8a1dea8dd896e698602de70868cfe9`
BLAKE2b-256	`7e9dbaf8bb8d425c7430d1b6086cfcebfabd1d55db45c3c6f658b4dcc978c98a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiom_rag-1.0.1-py3-none-any.whl:

Publisher: publish.yml on axiom-llc/axiom-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: axiom_rag-1.0.1-py3-none-any.whl
- Subject digest: a7ac7b18379733c37b5449e0586cf6b2c464d65b2490c4def15cf27785062640
- Sigstore transparency entry: 1102637383
- Sigstore integration time: Mar 14, 2026
Source repository:
- Permalink: axiom-llc/axiom-rag@6832f5822ad29084ee7c0499fd9b9f1441d705e3
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/axiom-llc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6832f5822ad29084ee7c0499fd9b9f1441d705e3
- Trigger Event: release

axiom-rag 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

axiom-rag

What It Does

Installation

CLI

REST API

POST /ingest

POST /query

GET /documents

DELETE /documents/<doc_id>

GET /stats

Architecture

Design Notes

Configuration

Tests

Evaluation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`POST /ingest`

`POST /query`

`GET /documents`

`DELETE /documents/<doc_id>`

`GET /stats`