Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.
Project description
attune-rag
Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.
- No LLM SDK at install time. All provider deps are
optional extras. Two required runtime deps:
structlog,jinja2. - Pluggable corpus. Use attune-help (the default), any
markdown directory, or your own
CorpusProtocol. - Returns a prompt string + citation records by default
—
pipeline.run()never opens a network connection. You call your own LLM however you like. Optional provider adapters ship convenience wrappers. - Optional hybrid retrieval.
QueryExpanderandLLMRerankerlayer Claude Haiku on top of keyword retrieval to improve recall and precision — both opt-in, both fail-safe.
Why attune-rag
Most RAG libraries ship features. attune-rag ships measured
quality numbers and gates merges against them. The CI badge
isn't "tests pass" — it's P@1 ≥ 0.95, R@3 = 1.00, mean faithfulness ≥ 0.9686 (locked at
docs/specs/release-quality-baseline/baseline-1.md)
plus per-axis CPU + wall-clock perf thresholds (locked at
docs/specs/downstream-validation/perf-baseline.md).
A PR that drops mean_faithfulness below 0.9686 fails CI
automatically. Same for any latency hot-path regressing past
mean + 2σ. That's the differentiator.
vs LangChain / LlamaIndex
| attune-rag | LangChain | LlamaIndex | |
|---|---|---|---|
| Required runtime deps | 2 | many (transitively, ~30+) | many (~25+) |
| LLM SDK at install | none | bundled | bundled |
| Published quality regression thresholds | yes (P@1, R@3, faithfulness) | no | no |
| Published perf thresholds (wall + CPU) | yes | no | no |
| Citation primitives built-in | yes | add-on | add-on |
| "Get a string back, call your own LLM" | default | possible w/ effort | possible w/ effort |
LangChain and LlamaIndex are fantastic frameworks if you want batteries-included orchestration. attune-rag is the alternative when you want a RAG component you can drop into an existing app without buying into a framework — and want the quality bar quantified, not implied.
What attune-rag is not
Honest exclusions, so you can self-disqualify if you need any of these:
- Not an agent framework. No multi-step chains, no tool-use orchestration, no agent loops.
- Not a document-parsing toolkit. Bring your markdown
already-parsed; use
unstructured.ioor similar upstream. - Not a vector DB integration. Keyword retrieval is the
default; you wire your own vector store if you need one (an
EmbeddingRetrieveris on the post-freeze roadmap — see below). - Not a one-line-install batteries-included framework. That's LangChain / LlamaIndex. attune-rag is for the case where that's too much.
Install
pip install attune-rag # core only
pip install 'attune-rag[attune-help]' # + bundled help corpus
pip install 'attune-rag[claude]' # + Claude adapter
pip install 'attune-rag[gemini]' # + Gemini adapter
pip install 'attune-rag[all]' # everything
Quick start — Claude
pip install 'attune-rag[attune-help,claude]'
import asyncio
from attune_rag import RagPipeline
async def main():
pipeline = RagPipeline() # defaults to AttuneHelpCorpus
response, result = await pipeline.run_and_generate(
"How do I run a security audit with attune?",
provider="claude",
)
print(response)
print("\nSources:", [h.entry.path for h in result.citation.hits])
asyncio.run(main())
Quick start — Gemini
pip install 'attune-rag[attune-help,gemini]'
response, result = await pipeline.run_and_generate(
"...", provider="gemini", model="gemini-1.5-pro",
)
Quick start — custom corpus, any LLM
from pathlib import Path
from attune_rag import RagPipeline, DirectoryCorpus
pipeline = RagPipeline(corpus=DirectoryCorpus(Path("./my-docs")))
result = pipeline.run("How do I...?")
# Send result.augmented_prompt to whatever LLM you use.
# The pipeline itself does NOT call an LLM unless you use
# run_and_generate or call a provider adapter yourself.
Hybrid retrieval (optional)
QueryExpander and LLMReranker require the [claude] extra and an
ANTHROPIC_API_KEY. Both are opt-in and fail-safe — any API error
falls back to keyword-only order automatically.
from attune_rag import RagPipeline, LLMReranker, QueryExpander
# Reranker only (recommended for precision):
pipeline = RagPipeline(reranker=LLMReranker())
# Expander + reranker (max coverage):
pipeline = RagPipeline(
expander=QueryExpander(),
reranker=LLMReranker(),
)
Template editor primitives (attune_rag.editor)
Headless toolkit for tools that need to validate, lint, and refactor a
template corpus — used by the attune-gui
template editor and the attune-author
edit CLI, but works standalone with any
CorpusProtocol.
| API | What it does |
|---|---|
load_schema() |
Loads template_schema.json (the v1 frontmatter contract: required type enum + name; optional tags, aliases, summary, source, hash; additionalProperties: true). |
parse_frontmatter(text) / validate_frontmatter(data) |
Split a template into frontmatter + body and report typed FrontmatterIssues — used by linters and editors. |
lint_template(text, rel_path, corpus) |
Returns Diagnostic[] for schema violations, broken [[alias]] references, and depth-marker sequence errors. 1-indexed line/col ranges. |
autocomplete_tags(corpus, prefix, limit) / autocomplete_aliases(corpus, prefix, limit) |
Prefix-match completions ranked by frequency (tags) or lexical proximity (aliases). Sub-ms on 1k templates. |
find_references(corpus, name, kind) |
Locate every alias/tag/path occurrence across body, frontmatter, and cross_links.json. |
plan_rename(corpus, old, new, kind) |
Build a RenamePlan (one FileEdit per affected file with unified-diff hunks) for kind="alias" or "tag". Raises RenameCollisionError on existing alias targets. |
apply_rename(corpus, plan) |
Atomically apply the plan (tempfile-per-file + sequential rename + drift-detection rollback). Returns the list of affected paths. |
Schema, lint, and rename are pure functions over CorpusProtocol — no I/O,
no global state. All three pieces are tested as a unit and used live by the
attune-gui editor's /api/corpus/<id>/lint, /autocomplete, and
/refactor/rename/{preview,apply} routes.
from attune_rag import DirectoryCorpus
from attune_rag.editor import lint_template, plan_rename, apply_rename
corpus = DirectoryCorpus(Path("./templates")).load()
# Validate a template before saving
diagnostics = lint_template(
text=Path("./templates/concepts/foo.md").read_text(),
rel_path="concepts/foo.md",
corpus=corpus,
)
# Rename an alias across the whole corpus
plan = plan_rename(corpus, old="oldname", new="newname", kind="alias")
print(f"Affects {len(plan.edits)} files")
affected = apply_rename(corpus, plan)
Dashboard
attune-rag dashboard show # live terminal dashboard
attune-rag dashboard render --out report.html # HTML snapshot
Quality baselines
attune-rag locks two baselines, both gated by CI. Thresholds
are empirically derived (mean ± 2σ) from back-to-back
benchmark runs on an unchanged HEAD — grounded, not guessed.
Retrieval + faithfulness
| Metric | Threshold (current) | Source |
|---|---|---|
precision_at_1 |
≥ 0.95 | retrieval, deterministic |
recall_at_3 |
= 1.00 | retrieval, deterministic |
mean_faithfulness |
≥ 0.9686 | Claude judge, σ ≈ 0.005 |
Gated by .github/workflows/benchmark.yml.
Faithfulness gating engages when the PR touches retrieval,
reranker, expander, pipeline, prompts, or eval paths, or when
the PR title contains [full-bench]. Methodology + raw numbers
in docs/specs/release-quality-baseline/.
Per-hot-path latency
Locked dual-axis (wall-clock + CPU-time) thresholds on the four benchmarks. CPU-time is the gating axis (deterministic); wall-clock is advisory through Phase 4 burn-in, then revisited.
| Benchmark | Axis | Mean | Threshold |
|---|---|---|---|
keyword_retriever_retrieve |
cpu | 3,212 µs | 34,493 µs |
directory_corpus_load |
cpu | 47 µs | 66 µs |
rag_pipeline_run (retrieval-only) |
cpu | 537 µs | 625 µs |
llm_reranker_rerank |
wall | 728 ms | 1.07 s |
Numbers measured from N=30 back-to-back runs on the
locked-baseline runner (Linux ubuntu-latest, CPython 3.11.15).
Two thresholds reflect different noise profiles:
keyword_retriever_retrievehas a wide CPU band because its measured σ ≈ 15.6 ms reflects cold-cache effects on the first few iterations — the threshold formula ismean + 2σ, empirically derived rather than tuned for tightness.llm_reranker_rerankis wall-clock-only because Anthropic network variance (σ ≈ 170 ms) dominates the CPU axis; the gate is set generously and is advisory through Phase 4 W3.
Gated by .github/workflows/perf.yml
per-PR (advisory comment in Phase 4 W1–W2, blocking in W3.1).
Raw numbers + hardware fingerprint + the full 8-row dual-axis
table:
docs/specs/downstream-validation/perf-baseline.md.
Why this is the differentiator
Most RAG libraries A/B-test internally and ship the result. attune-rag publishes the thresholds, gates merges against them, and re-measures whenever the corpus, judge prompt, or hardware changes. The receipts are checked in.
Bundled .help/ corpus
The repo ships a polished .help/ corpus that documents
attune-rag's own surface — 143 templates across 13 features ×
11 kinds (concept, task, reference, quickstart, faq,
error, warning, tip, note, comparison,
troubleshooting). Generated by
attune-author with
strict fact-check; queryable via AttuneHelpCorpus or as the
bundled default for RagPipeline(). See
.help/features.yaml for the feature
map and .help/templates/ for the content.
The 13 features: pipeline, retrieval, corpus, prompts,
provenance, providers, eval, benchmark, cli, editor,
dashboard, expander, reranker.
What faithfulness measures
Faithfulness scores how well an answer is grounded in the retrieved
passages — 1.0 means every claim in the answer is supported by a
cited source; lower scores mean some claims have no support in the
context. It catches hallucination in a way that precision_at_k and
recall_at_k can't: those only measure whether the right documents
were retrieved, not whether the generated answer actually used them.
attune-rag uses Claude as the judge via Anthropic's tool-use API
to produce a structured score in [0.0, 1.0] for each
(query, answer, retrieved_context) triple. The reported metric is
the mean over the golden query set. Aggregate σ ≈ 0.005 over 40
queries even though per-query judge non-determinism can swing 40+
percentage points on individual queries — averaging absorbs the noise.
Run faithfulness manually
pip install 'attune-rag[claude]'
export ANTHROPIC_API_KEY=sk-ant-...
# Retrieval metrics only (free, deterministic):
attune-rag-benchmark --queries queries.yaml --json out.json
# Add faithfulness (~1 Claude API call per query, costs tokens):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --json out.json
# Compare extended-thinking on vs off (2× judge cost):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --compare-thinking --json out.json
The judge implementation lives at
attune_rag.eval.faithfulness.FaithfulnessJudge. Note: attune_rag.eval.*
is currently INTERNAL and may move — the attune-rag-benchmark --with-faithfulness CLI is the stable contract.
For the methodology behind the 0.9686 threshold, the v1/v2 ground-truth
calibration runs, and the extended-thinking-vs-default decision record, see
docs/rag/faithfulness-thinking-calibration.md.
Roadmap — embeddings (post-freeze 0.2.0+)
Keyword retrieval + optional Claude reranker currently meet
the locked P@1 ≥ 0.95, R@3 = 1.00 thresholds against the
attune-help golden set. The remaining hard queries
(3 of 28, currently xpass-gated under [no-embeddings])
have zero token overlap against their target doc (e.g.
"vulnerability scan" → tool-security-audit.md). Closing
that gap needs vector search.
The plan is to ship attune-rag[embeddings] using
fastembed for local,
CPU-only embeddings — no new network dependency, no API key
required at retrieval time. Keyword retrieval stays the default;
embeddings layer in opt-in, same shape as QueryExpander and
LLMReranker. Shipping is paced by the
Phase 4 feature-freeze
currently in progress — public surface additions wait for the
0.2.0 cut.
See CHANGELOG.md for the decision record and remaining-gap analysis.
Prompt caching (Claude only)
When using the Claude provider, run_and_generate automatically enables
Anthropic prompt caching
on the stable RAG context prefix (≥ 1 024 chars). This eliminates
repeated token costs on the corpus portion of the prompt when the same
context block is reused across calls.
No configuration needed — the provider handles the cache_control
header automatically.
Public API
attune-rag's public surface is documented below and snapshot-tested in tests/unit/test_api_surface.py. Formal SemVer commitments begin with the 0.2.0 release — see docs/POLICY.md for the deprecation policy. Until then the surface is honor-system: the lock test catches drift, but treat 0.1.x as still-evolving.
Top-level (from attune_rag import ...):
- Pipeline —
RagPipeline,RagResult - Corpus —
CorpusProtocol,RetrievalEntry,DirectoryCorpus,AttuneHelpCorpus - Retrieval —
KeywordRetriever,RetrievalHit,RetrieverProtocol - Provenance —
CitationRecord,CitedSource,ClaimCitation,format_citations_markdown,format_claim_citations_markdown - Prompting —
build_augmented_prompt,PROMPT_VARIANTS - Hybrid retrieval —
QueryExpander,LLMReranker
PUBLIC submodules (importable by qualified path):
attune_rag.corpus— exposesAliasInfo,DuplicateAliasErrorin addition to the top-level corpus namesattune_rag.corpus.attune_help—AttuneHelpCorpusattune_rag.corpus.help_adapter—HelpCorpusAdapterProtocolattune_rag.providers—LLMProvider,get_provider,list_availableattune_rag.editor— template-editor primitives (lint, schema, rename, autocomplete, references); see "Template editor primitives" above for the symbol listattune_rag.editor.{rename,schema,lint,autocomplete,references}— the individual editor submodules
Anything not listed above is INTERNAL and may change in any release.
The underscore-prefixed editor modules (attune_rag.editor._rename
etc.) shipped in 0.1.x are deprecation shims as of 0.2.0; they
re-export the new non-underscore names and emit DeprecationWarning.
They are removed in 0.3.0.
Status
v0.1.19 (alpha). In
Phase 4 of the v1.0 roadmap — a
four-week feature freeze + downstream validation soak; formal
0.2.0 SemVer cut follows. Until then the public surface is
honor-system + lock-tested; deprecation policy at
docs/POLICY.md.
Part of the attune ecosystem (attune-ai, attune-help, attune-author, attune-gui).
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file attune_rag-0.1.22.tar.gz.
File metadata
- Download URL: attune_rag-0.1.22.tar.gz
- Upload date:
- Size: 90.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43520f6cb77c7d44ce6f98a312d3c44f8432fbc9bddb004ad21fefc2c46ba4be
|
|
| MD5 |
2956b695931235465b92d9354afa2f70
|
|
| BLAKE2b-256 |
2a3076c2935db0c0db3ea4705f0b002893aea3cba4c4ca320f33811b15e2a296
|
Provenance
The following attestation bundles were made for attune_rag-0.1.22.tar.gz:
Publisher:
publish.yml on Smart-AI-Memory/attune-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
attune_rag-0.1.22.tar.gz -
Subject digest:
43520f6cb77c7d44ce6f98a312d3c44f8432fbc9bddb004ad21fefc2c46ba4be - Sigstore transparency entry: 1587344862
- Sigstore integration time:
-
Permalink:
Smart-AI-Memory/attune-rag@fed00f514e99b1d35370b8c458ca4734527ae157 -
Branch / Tag:
refs/tags/v0.1.22 - Owner: https://github.com/Smart-AI-Memory
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fed00f514e99b1d35370b8c458ca4734527ae157 -
Trigger Event:
release
-
Statement type:
File details
Details for the file attune_rag-0.1.22-py3-none-any.whl.
File metadata
- Download URL: attune_rag-0.1.22-py3-none-any.whl
- Upload date:
- Size: 100.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
184fba45df84903d280dc80c47c71648c9491f8b0b3a995c4837f7ba7c06e95d
|
|
| MD5 |
b3d47ebd45148f22310ddd25b55140e5
|
|
| BLAKE2b-256 |
b0d7d603f8a17f7138724b619f15f8b0d4f209680fe46e773cf588735ec209e8
|
Provenance
The following attestation bundles were made for attune_rag-0.1.22-py3-none-any.whl:
Publisher:
publish.yml on Smart-AI-Memory/attune-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
attune_rag-0.1.22-py3-none-any.whl -
Subject digest:
184fba45df84903d280dc80c47c71648c9491f8b0b3a995c4837f7ba7c06e95d - Sigstore transparency entry: 1587345293
- Sigstore integration time:
-
Permalink:
Smart-AI-Memory/attune-rag@fed00f514e99b1d35370b8c458ca4734527ae157 -
Branch / Tag:
refs/tags/v0.1.22 - Owner: https://github.com/Smart-AI-Memory
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fed00f514e99b1d35370b8c458ca4734527ae157 -
Trigger Event:
release
-
Statement type: