AIAR — local retrieval-augmented generation with LLM-judge grounding loop
Project description
AIAR — Local RAG with LLM-as-judge and a grounding loop
AIAR is a local-first retrieval-augmented generation framework for Python.
It runs against your own Ollama instance, ingests your
own documents, and ships three production-grade primitives out of the box:
hybrid retrieval, an LLM-as-judge that returns a structured Verdict, and a
grounding store that lets accepted corrections feed back into future answers.
It is built for developers, researchers, and AI hobbyists who want to own
their stack end to end — no cloud calls, no telemetry, no vendor lock-in.
Install
pip install aiar-rag
# or, with the full retrieval extras (BM25, cross-encoder reranker, HyDE):
pip install 'aiar-rag[rag]'
Note on the name: the distribution on PyPI is
aiar-ragbecauseaiarwas already taken. The import package remainsaiar— so your code usesimport aiar, but you install withpip install aiar-rag.
Prerequisites: Python 3.10+ and a running Ollama
daemon (default http://127.0.0.1:11434). Pull at least one chat model and
one embedding model, for example:
ollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text
Quickstart
from aiar.harness.pipeline import answer_prompt
result = answer_prompt("What did our Q3 deployment doc say about rollback?", judge=True)
print(result["answer"])
print(result["verdict"]) # {"label": "Supported" | "Unsupported" | ..., "rationale": "...", ...}
result also carries grounded, reground_applied, retrieval, and
latency_ms so you can wire the loop into your own UI or pipeline.
Why AIAR
Most local-RAG stacks stop at "retrieve and stuff into a prompt." AIAR treats the answer as the beginning of the loop, not the end. The judge catches hallucinations the moment they happen; the grounding store makes sure the same hallucination does not happen twice. The whole system runs on a laptop with a Qwen-class model and no external API calls — which means you can ship it into environments where cloud calls are not allowed, and you can audit every byte the model sees.
The three wedges
Hybrid retrieval
AIAR fuses lexical and semantic retrieval rather than picking one. Every
query runs through BM25 over a tokenized index and a vector search over
Ollama embeddings; the two ranked lists are merged with reciprocal-rank
fusion (RRF), then optionally reranked by a cross-encoder. HyDE-style query
rewriting and configurable top_k / fetch_k give you knobs without
forcing a tuning project on day one.
LLM-as-judge Verdict
Every answer can be graded by a second LLM call that returns a structured
Verdict: a label (Supported, Partially supported, Unsupported,
Off-topic), a rationale, and the citations actually relied on. The judge
sees the same retrieved context as the answerer, so its critique is grounded
in evidence — and downstream code can branch on verdict.label to gate,
retry, or escalate.
Grounding loop
When a Verdict is accepted (by a human, by automation, or by policy), the
answer plus its supporting context is persisted to a grounding store keyed
on the prompt. Next time a similar prompt arrives, AIAR reinjects that
grounding before the answerer runs and flags reground_applied=True. The
system stops re-making the same mistake — your corrections compound.
Used by
- Errorta — the polished desktop product built on AIAR. Tauri + React shell, hardware-aware Ollama setup, drag-and-drop corpus management, and the judge-and-grounding review UX for end users. (Repo private until v1.0 launch.)
Building something on AIAR? Open a PR adding it here.
What is in the box
aiar.harness.pipeline.answer_prompt— the one-call entry point used in the Quickstart above.aiar.rag— hybrid retrieval, BM25 + vector + RRF, optional cross-encoder reranker, HyDE rewriting.aiar.eval— the LLM-as-judge with structuredVerdictschema.aiar.grounding— accepted-correction store and reground pipeline.aiar.harness.service— optional FastAPI service exposing/services/promptand/services/metafor other apps on the box.aiar.observability— call IDs, latency, and retrieval traces for every answer.
Configuration
AIAR reads AIAR_* environment variables for runtime configuration —
endpoints, model names, reranker toggles, grounding-store paths, instance
isolation, and so on. Sensible defaults work out of the box for a local
Ollama install; see PLAYBOOK.md for the full matrix.
Contributing
PRs welcome. The deep-dive operator guide lives at PLAYBOOK.md — end-to-end walkthrough covering ingestion, the harness, the watcher GUI, regrounding, evals, and operational notes. Worked examples live under examples/feature-guides/improving-rag.md.
For framework-level discussion, file an issue. For polished-product feedback, see Errorta (above).
License
Apache-2.0. See LICENSE, NOTICE, or the license field in
pyproject.toml.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aiar_rag-0.2.0.tar.gz.
File metadata
- Download URL: aiar_rag-0.2.0.tar.gz
- Upload date:
- Size: 694.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61b47a47de288f85be0e75eb12e4c4b5abc747808bbebcb6621a664111590c9c
|
|
| MD5 |
da088a4eb2e8f5138852e265051ecd9f
|
|
| BLAKE2b-256 |
9627ac185ec44eea921170b23eef4385996723cada0c493804801722b48cdb9a
|
File details
Details for the file aiar_rag-0.2.0-py3-none-any.whl.
File metadata
- Download URL: aiar_rag-0.2.0-py3-none-any.whl
- Upload date:
- Size: 686.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
119ca8558bcc170540053a58b504b438a792a62768818be8c48f882f176ab606
|
|
| MD5 |
86d92889237fa31558b6969eaf17092d
|
|
| BLAKE2b-256 |
b4267a3eca6e92f294c3453e006a09bdf0cd3e12c7186b908244359ed0cac455
|