AIAR — local retrieval-augmented generation with LLM-judge grounding loop

These details have not been verified by PyPI

Project links

Project description

AIAR logo

AIAR — Local RAG with LLM-as-judge and a grounding loop

AIAR is a local-first retrieval-augmented generation framework for Python. It runs against your own Ollama instance, ingests your own documents, and ships three production-grade primitives out of the box: hybrid retrieval, an LLM-as-judge that returns a structured Verdict, and a grounding store that lets accepted corrections feed back into future answers. It is built for developers, researchers, and AI hobbyists who want to own their stack end to end — no cloud calls, no telemetry, no vendor lock-in.

Install

pip install aiar-rag
# or, with the full retrieval extras (BM25, cross-encoder reranker, HyDE):
pip install 'aiar-rag[rag]'

Note on the name: the distribution on PyPI is aiar-rag because aiar was already taken. The import package remains aiar — so your code uses import aiar, but you install with pip install aiar-rag.

Prerequisites: Python 3.10+ and a running Ollama daemon (default http://127.0.0.1:11434). Pull at least one chat model and one embedding model, for example:

ollama pull qwen2.5:7b-instruct
ollama pull nomic-embed-text

Quickstart

from aiar.harness.pipeline import answer_prompt

result = answer_prompt("What did our Q3 deployment doc say about rollback?", judge=True)
print(result["answer"])
print(result["verdict"])   # {"label": "Supported" | "Unsupported" | ..., "rationale": "...", ...}

result also carries grounded, reground_applied, retrieval, and latency_ms so you can wire the loop into your own UI or pipeline.

Why AIAR

Most local-RAG stacks stop at "retrieve and stuff into a prompt." AIAR treats the answer as the beginning of the loop, not the end. The judge catches hallucinations the moment they happen; the grounding store makes sure the same hallucination does not happen twice. The whole system runs on a laptop with a Qwen-class model and no external API calls — which means you can ship it into environments where cloud calls are not allowed, and you can audit every byte the model sees.

The three wedges

Hybrid retrieval

AIAR fuses lexical and semantic retrieval rather than picking one. Every query runs through BM25 over a tokenized index and a vector search over Ollama embeddings; the two ranked lists are merged with reciprocal-rank fusion (RRF), then optionally reranked by a cross-encoder. HyDE-style query rewriting and configurable top_k / fetch_k give you knobs without forcing a tuning project on day one.

LLM-as-judge Verdict

Every answer can be graded by a second LLM call that returns a structured Verdict: a label (Supported, Partially supported, Unsupported, Off-topic), a rationale, and the citations actually relied on. The judge sees the same retrieved context as the answerer, so its critique is grounded in evidence — and downstream code can branch on verdict.label to gate, retry, or escalate.

Grounding loop

When a Verdict is accepted (by a human, by automation, or by policy), the answer plus its supporting context is persisted to a grounding store keyed on the prompt. Next time a similar prompt arrives, AIAR reinjects that grounding before the answerer runs and flags reground_applied=True. The system stops re-making the same mistake — your corrections compound.

Used by

Errorta — the polished desktop product built on AIAR. Tauri + React shell, hardware-aware Ollama setup, drag-and-drop corpus management, and the judge-and-grounding review UX for end users. (Repo private until v1.0 launch.)

Building something on AIAR? Open a PR adding it here.

What is in the box

aiar.harness.pipeline.answer_prompt — the one-call entry point used in the Quickstart above.
aiar.rag — hybrid retrieval, BM25 + vector + RRF, optional cross-encoder reranker, HyDE rewriting.
aiar.eval — the LLM-as-judge with structured Verdict schema.
aiar.grounding — accepted-correction store and reground pipeline.
aiar.harness.service — optional FastAPI service exposing /services/prompt and /services/meta for other apps on the box.
aiar.observability — call IDs, latency, and retrieval traces for every answer.

Configuration

AIAR reads AIAR_* environment variables for runtime configuration — endpoints, model names, reranker toggles, grounding-store paths, instance isolation, and so on. Sensible defaults work out of the box for a local Ollama install; see PLAYBOOK.md for the full matrix.

Contributing

PRs welcome. The deep-dive operator guide lives at PLAYBOOK.md — end-to-end walkthrough covering ingestion, the harness, the watcher GUI, regrounding, evals, and operational notes. Worked examples live under examples/feature-guides/improving-rag.md.

For framework-level discussion, file an issue. For polished-product feedback, see Errorta (above).

License

Apache-2.0. See LICENSE, NOTICE, or the license field in pyproject.toml.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiar_rag-0.2.0.tar.gz (694.0 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aiar_rag-0.2.0-py3-none-any.whl (686.9 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file aiar_rag-0.2.0.tar.gz.

File metadata

Download URL: aiar_rag-0.2.0.tar.gz
Upload date: Jun 8, 2026
Size: 694.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aiar_rag-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`61b47a47de288f85be0e75eb12e4c4b5abc747808bbebcb6621a664111590c9c`
MD5	`da088a4eb2e8f5138852e265051ecd9f`
BLAKE2b-256	`9627ac185ec44eea921170b23eef4385996723cada0c493804801722b48cdb9a`

See more details on using hashes here.

File details

Details for the file aiar_rag-0.2.0-py3-none-any.whl.

File metadata

Download URL: aiar_rag-0.2.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 686.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aiar_rag-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`119ca8558bcc170540053a58b504b438a792a62768818be8c48f882f176ab606`
MD5	`86d92889237fa31558b6969eaf17092d`
BLAKE2b-256	`b4267a3eca6e92f294c3453e006a09bdf0cd3e12c7186b908244359ed0cac455`

See more details on using hashes here.

aiar-rag 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AIAR — Local RAG with LLM-as-judge and a grounding loop

Install

Quickstart

Why AIAR

The three wedges

Hybrid retrieval

LLM-as-judge Verdict

Grounding loop

Used by

What is in the box

Configuration

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes