Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.
Project description
attune-rag
Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.
- No LLM SDK at install time. All provider deps are
optional extras. Two required runtime deps:
structlog,jinja2. - Pluggable corpus. Use attune-help (the default), any
markdown directory, or your own
CorpusProtocol. - Returns a prompt string + citation records by default
—
pipeline.run()never opens a network connection. You call your own LLM however you like. Optional provider adapters ship convenience wrappers. - Optional hybrid retrieval.
QueryExpanderandLLMRerankerlayer Claude Haiku on top of keyword retrieval to improve recall and precision — both opt-in, both fail-safe.
Why attune-rag
Most RAG libraries ship features. attune-rag ships measured
quality numbers and gates merges against them. The CI badge
isn't "tests pass" — it's P@1 ≥ 0.95, R@3 = 1.00, mean faithfulness ≥ 0.9686 (locked at
docs/specs/release-quality-baseline/baseline-1.md)
plus per-axis CPU + wall-clock perf thresholds (locked at
docs/specs/downstream-validation/perf-baseline.md).
A PR that drops mean_faithfulness below 0.9686 fails CI
automatically. Same for any latency hot-path regressing past
mean + 2σ. That's the differentiator.
vs LangChain / LlamaIndex
| attune-rag | LangChain | LlamaIndex | |
|---|---|---|---|
| Required runtime deps | 2 | many (transitively, ~30+) | many (~25+) |
| LLM SDK at install | none | bundled | bundled |
| Published quality regression thresholds | yes (P@1, R@3, faithfulness) | no | no |
| Published perf thresholds (wall + CPU) | yes | no | no |
| Citation primitives built-in | yes | add-on | add-on |
| "Get a string back, call your own LLM" | default | possible w/ effort | possible w/ effort |
LangChain and LlamaIndex are fantastic frameworks if you want batteries-included orchestration. attune-rag is the alternative when you want a RAG component you can drop into an existing app without buying into a framework — and want the quality bar quantified, not implied.
Beyond drop-in retrieval, attune-rag is the grounding foundation
for the attune-* family's content-quality discipline. The
attune-author polish/fact-check pipeline uses attune-rag's
retrieval + faithfulness primitives to verify generated help
content is grounded in source material before it's marked
authoritative — the same mean_faithfulness ≥ 0.9686 discipline
that gates this library's own benchmarks, extended to the
authoring loop.
What attune-rag is not
Honest exclusions, so you can self-disqualify if you need any of these:
- Not an agent framework. No multi-step chains, no tool-use orchestration, no agent loops.
- Not a document-parsing toolkit. Bring your markdown
already-parsed; use
unstructured.ioor similar upstream. - Not a vector DB integration. Keyword retrieval is the
default; you wire your own vector store if you need one (an
EmbeddingRetrieveris on the post-freeze roadmap — see below). - Not a one-line-install batteries-included framework. That's LangChain / LlamaIndex. attune-rag is for the case where that's too much.
Install
pip install attune-rag # core only
pip install 'attune-rag[attune-help]' # + bundled help corpus
pip install 'attune-rag[claude]' # + Claude adapter
pip install 'attune-rag[gemini]' # + Gemini adapter
pip install 'attune-rag[all]' # everything
Quick start — Claude
pip install 'attune-rag[attune-help,claude]'
import asyncio
from attune_rag import RagPipeline
async def main():
pipeline = RagPipeline() # defaults to AttuneHelpCorpus
response, result = await pipeline.run_and_generate(
"How do I run a security audit with attune?",
provider="claude",
)
print(response)
print("\nSources:", [h.entry.path for h in result.citation.hits])
asyncio.run(main())
Quick start — Gemini
pip install 'attune-rag[attune-help,gemini]'
response, result = await pipeline.run_and_generate(
"...", provider="gemini", model="gemini-1.5-pro",
)
Quick start — custom corpus, any LLM
from pathlib import Path
from attune_rag import RagPipeline, DirectoryCorpus
pipeline = RagPipeline(corpus=DirectoryCorpus(Path("./my-docs")))
result = pipeline.run("How do I...?")
# Send result.augmented_prompt to whatever LLM you use.
# The pipeline itself does NOT call an LLM unless you use
# run_and_generate or call a provider adapter yourself.
📖 Building a quality corpus. See
docs/USER_CORPUS_GUIDE.mdfor the corpus-authoring discipline that produced the bundled attune-help corpus's 100% / 100% baseline + 100% paraphrased R@3: frontmatter aliases, multi-token intent, theMIN_ALIAS_OVERLAPknob, stemmer traps, the override file pattern, and the strict-dominance measurement loop. The guide is the v0 forerunner of the v1.0.0 framework framing (user-corpus-onboardingspec).
Hybrid retrieval (optional)
QueryExpander and LLMReranker require the [claude] extra and an
ANTHROPIC_API_KEY. Both are opt-in and fail-safe — any API error
falls back to keyword-only order automatically.
from attune_rag import RagPipeline, LLMReranker, QueryExpander
# Reranker only (recommended for precision):
pipeline = RagPipeline(reranker=LLMReranker())
# Expander + reranker (max coverage):
pipeline = RagPipeline(
expander=QueryExpander(),
reranker=LLMReranker(),
)
Template editor primitives (attune_rag.editor)
Headless toolkit for tools that need to validate, lint, and refactor a
template corpus — used by the attune-gui
template editor and the attune-author
edit CLI, but works standalone with any
CorpusProtocol.
| API | What it does |
|---|---|
load_schema() |
Loads template_schema.json (the v1 frontmatter contract: required type enum + name; optional tags, aliases, summary, source, hash; additionalProperties: true). |
parse_frontmatter(text) / validate_frontmatter(data) |
Split a template into frontmatter + body and report typed FrontmatterIssues — used by linters and editors. |
lint_template(text, rel_path, corpus) |
Returns Diagnostic[] for schema violations, broken [[alias]] references, and depth-marker sequence errors. 1-indexed line/col ranges. |
autocomplete_tags(corpus, prefix, limit) / autocomplete_aliases(corpus, prefix, limit) |
Prefix-match completions ranked by frequency (tags) or lexical proximity (aliases). Sub-ms on 1k templates. |
find_references(corpus, name, kind) |
Locate every alias/tag/path occurrence across body, frontmatter, and cross_links.json. |
plan_rename(corpus, old, new, kind) |
Build a RenamePlan (one FileEdit per affected file with unified-diff hunks) for kind="alias" or "tag". Raises RenameCollisionError on existing alias targets. |
apply_rename(corpus, plan) |
Atomically apply the plan (tempfile-per-file + sequential rename + drift-detection rollback). Returns the list of affected paths. |
Schema, lint, and rename are pure functions over CorpusProtocol — no I/O,
no global state. All three pieces are tested as a unit and used live by the
attune-gui editor's /api/corpus/<id>/lint, /autocomplete, and
/refactor/rename/{preview,apply} routes.
from attune_rag import DirectoryCorpus
from attune_rag.editor import lint_template, plan_rename, apply_rename
corpus = DirectoryCorpus(Path("./templates")).load()
# Validate a template before saving
diagnostics = lint_template(
text=Path("./templates/concepts/foo.md").read_text(),
rel_path="concepts/foo.md",
corpus=corpus,
)
# Rename an alias across the whole corpus
plan = plan_rename(corpus, old="oldname", new="newname", kind="alias")
print(f"Affects {len(plan.edits)} files")
affected = apply_rename(corpus, plan)
Dashboard
attune-rag dashboard show # live terminal dashboard
attune-rag dashboard render --out report.html # HTML snapshot
Quality baselines
attune-rag locks two baselines, both gated by CI. Thresholds
are empirically derived (mean ± 2σ) from back-to-back
benchmark runs on an unchanged HEAD — grounded, not guessed.
Retrieval + faithfulness
| Metric | Threshold (current) | Source |
|---|---|---|
precision_at_1 |
≥ 0.95 | retrieval, deterministic |
recall_at_3 |
= 1.00 | retrieval, deterministic |
mean_faithfulness |
≥ 0.9686 | Claude judge, σ ≈ 0.005 |
Gated by .github/workflows/benchmark.yml.
Faithfulness gating engages when the PR touches retrieval,
reranker, expander, pipeline, prompts, or eval paths, or when
the PR title contains [full-bench]. Methodology + raw numbers
in docs/specs/release-quality-baseline/.
Per-hot-path latency
Locked dual-axis (wall-clock + CPU-time) thresholds on the four benchmarks. CPU-time is the gating axis (deterministic); wall-clock is advisory.
Numbers measured under the V2 multi-run methodology (5
invocations × 20 runs = 100 measurements per metric) on the
locked-baseline runner (Linux ubuntu-latest, CPython 3.11.15).
Inter-run and intra-run variance are tracked separately;
thresholds are mean + 2σ × inter_run_stdev. Full 8-row
dual-axis table + hardware fingerprint + per-metric noise
profile:
docs/specs/downstream-validation/perf-baseline.md.
Why two threshold styles in the locked table:
keyword_retriever_retrievehas a wider CPU band because measured intra-run variance reflects cold-cache effects on the first few iterations — empirically derived, not tuned for tightness.llm_reranker_rerankis wall-clock-only because Anthropic network variance dominates the CPU axis; the gate is set generously.
Gated by .github/workflows/perf.yml
per-PR (blocking on the CPU axis as of W3.1).
Why this is the differentiator
Most RAG libraries A/B-test internally and ship the result. attune-rag publishes the thresholds, gates merges against them, and re-measures whenever the corpus, judge prompt, or hardware changes. The receipts are checked in.
Bundled .help/ corpus
The repo ships a polished .help/ corpus that documents
attune-rag's own surface — 143 templates across 13 features ×
11 kinds (concept, task, reference, quickstart, faq,
error, warning, tip, note, comparison,
troubleshooting). Generated by
attune-author with
strict fact-check; queryable via AttuneHelpCorpus or as the
bundled default for RagPipeline(). See
.help/features.yaml for the feature
map and .help/templates/ for the content.
The 13 features: pipeline, retrieval, corpus, prompts,
provenance, providers, eval, benchmark, cli, editor,
dashboard, expander, reranker.
What faithfulness measures
Faithfulness scores how well an answer is grounded in the retrieved
passages — 1.0 means every claim in the answer is supported by a
cited source; lower scores mean some claims have no support in the
context. It catches hallucination in a way that precision_at_k and
recall_at_k can't: those only measure whether the right documents
were retrieved, not whether the generated answer actually used them.
attune-rag uses Claude as the judge via Anthropic's tool-use API
to produce a structured score in [0.0, 1.0] for each
(query, answer, retrieved_context) triple. The reported metric is
the mean over the golden query set. Aggregate σ ≈ 0.005 over 40
queries even though per-query judge non-determinism can swing 40+
percentage points on individual queries — averaging absorbs the noise.
The same discipline powers attune-author's polish/fact-check
pipeline — generated help content is scored against retrieved
source passages before being marked authoritative. attune-rag's
faithfulness primitives aren't just instrumentation; they're the
contract the family's content-quality story is built on.
Run faithfulness manually
pip install 'attune-rag[claude]'
export ANTHROPIC_API_KEY=sk-ant-...
# Retrieval metrics only (free, deterministic):
attune-rag-benchmark --queries queries.yaml --json out.json
# Add faithfulness (~1 Claude API call per query, costs tokens):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --json out.json
# Compare extended-thinking on vs off (2× judge cost):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --compare-thinking --json out.json
The judge implementation lives at
attune_rag.eval.faithfulness.FaithfulnessJudge. Note: attune_rag.eval.*
is currently INTERNAL and may move — the attune-rag-benchmark --with-faithfulness CLI is the stable contract.
For the methodology behind the 0.9686 threshold, the v1/v2 ground-truth
calibration runs, and the extended-thinking-vs-default decision record, see
docs/rag/faithfulness-thinking-calibration.md.
Roadmap — embeddings (post-freeze 0.2.0+)
Keyword retrieval + optional Claude reranker currently meet
the locked P@1 ≥ 0.95, R@3 = 1.00 thresholds against the
attune-help golden set. The remaining hard queries
(3 of 28, currently xpass-gated under [no-embeddings])
have zero token overlap against their target doc (e.g.
"vulnerability scan" → tool-security-audit.md). Closing
that gap needs vector search.
The plan is to ship attune-rag[embeddings] using
fastembed for local,
CPU-only embeddings — no new network dependency, no API key
required at retrieval time. Keyword retrieval stays the default;
embeddings layer in opt-in, same shape as QueryExpander and
LLMReranker. With 0.2.0 cut, embeddings are a Phase 5
candidate — see
docs/specs/ROADMAP-v1.md.
See CHANGELOG.md for the decision record and remaining-gap analysis.
Prompt caching (Claude only)
When using the Claude provider, run_and_generate automatically enables
Anthropic prompt caching
on the stable RAG context prefix (≥ 1 024 chars). This eliminates
repeated token costs on the corpus portion of the prompt when the same
context block is reused across calls.
No configuration needed — the provider handles the cache_control
header automatically.
Public API
attune-rag's public surface is documented below and snapshot-tested in tests/unit/test_api_surface.py. Formal SemVer commitments are in effect as of 0.2.0 — see docs/POLICY.md for the deprecation policy. Symbols PUBLIC in 0.2.x stay PUBLIC through every 0.2.z; the snapshot test catches drift.
Top-level (from attune_rag import ...):
- Pipeline —
RagPipeline,RagResult - Corpus —
CorpusProtocol,RetrievalEntry,DirectoryCorpus,AttuneHelpCorpus - Retrieval —
KeywordRetriever,RetrievalHit,RetrieverProtocol - Provenance —
CitationRecord,CitedSource,ClaimCitation,format_citations_markdown,format_claim_citations_markdown - Prompting —
build_augmented_prompt,PROMPT_VARIANTS - Hybrid retrieval —
QueryExpander,LLMReranker
PUBLIC submodules (importable by qualified path):
attune_rag.corpus— exposesAliasInfo,DuplicateAliasError,load_aliases_from_filein addition to the top-level corpus namesattune_rag.corpus.attune_help—AttuneHelpCorpusattune_rag.corpus.help_adapter—HelpCorpusAdapterProtocolattune_rag.providers—LLMProvider,get_provider,list_availableattune_rag.measure_corpus—measure(...)function +MeasureResultdataclass for scoring a corpus against a query set. CLI viapython -m attune_rag.measure_corpus ...or theattune-rag-measureconsole script. Seedocs/USER_CORPUS_GUIDE.md§6 for the worked example.attune_rag.editor— template-editor primitives (lint, schema, rename, autocomplete, references); see "Template editor primitives" above for the symbol listattune_rag.editor.{rename,schema,lint,autocomplete,references}— the individual editor submodules
Console scripts:
attune-rag— CLI entry point (attune_rag.cli:main)attune-rag-measure— quality measurement (attune_rag.measure_corpus:main); CI-suitable via--watermark-r3(non-zero exit on fail)
Anything not listed above is INTERNAL and may change in any release.
The underscore-prefixed editor modules (attune_rag.editor._rename
etc.) shipped in 0.1.x are deprecation shims as of 0.2.0; they
re-export the new non-underscore names and emit DeprecationWarning.
They are removed in 0.3.0.
Status
0.2.0 — first SemVer-binding cut. Phase 4 of the v1.0 roadmap
landed cleanly: quality baselines (P@1 ≥ 0.95, R@3 = 1.00, mean
faithfulness ≥ 0.9686) hold; per-hot-path perf thresholds re-locked
under the V2 multi-run methodology (5 × 20 measurements); attune-gui
downstream blocking gate stayed green throughout. From 0.2.0 forward,
docs/POLICY.md §2 binds — symbols PUBLIC in 0.2.x
stay PUBLIC through every 0.2.z.
We hit our Phase 4 goals ~3 weeks ahead of the nominal calendar and
opted to ship early via the freeze-override mechanism rather than let
the cadence clock run out — getting the user-facing additions
(attune-rag-measure console script + attune_rag.measure_corpus
module for benchmarking your own corpus quality;
load_aliases_from_file() for file-based alias customization) into
your hands sooner. Override rationale + per-PR receipts at
docs/specs/api-v0.2.0-cut/.
Classifier stays at 3 - Alpha — the Production/Stable flip is a
Phase 5 deliverable.
Part of the attune ecosystem (attune-ai, attune-help, attune-author, attune-gui).
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file attune_rag-0.2.0.tar.gz.
File metadata
- Download URL: attune_rag-0.2.0.tar.gz
- Upload date:
- Size: 102.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8d5f26bfa4b2379845492d2c36d29359e09ef8c7d03f47435cd06baacba4711
|
|
| MD5 |
a58a3fc325afd6532801adc8f436625a
|
|
| BLAKE2b-256 |
82da5de2d3a5e6f9f1830c19998b504f4fbd8e24d77e2a32709403389166bb30
|
Provenance
The following attestation bundles were made for attune_rag-0.2.0.tar.gz:
Publisher:
publish.yml on Smart-AI-Memory/attune-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
attune_rag-0.2.0.tar.gz -
Subject digest:
d8d5f26bfa4b2379845492d2c36d29359e09ef8c7d03f47435cd06baacba4711 - Sigstore transparency entry: 1627656153
- Sigstore integration time:
-
Permalink:
Smart-AI-Memory/attune-rag@81a54a560416e7262f182285823f20ffd6c8e634 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Smart-AI-Memory
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@81a54a560416e7262f182285823f20ffd6c8e634 -
Trigger Event:
release
-
Statement type:
File details
Details for the file attune_rag-0.2.0-py3-none-any.whl.
File metadata
- Download URL: attune_rag-0.2.0-py3-none-any.whl
- Upload date:
- Size: 114.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8c1b46e18c3336a654cf3609d628fcf2087d4e00459a22ad6ede3c4d88e33ae
|
|
| MD5 |
49aa8da3a58270ac47e5bcb68da9c58e
|
|
| BLAKE2b-256 |
ed49005cf1581d99227c15bf4afddfa115cada1d666d9836296fd28f3619de29
|
Provenance
The following attestation bundles were made for attune_rag-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on Smart-AI-Memory/attune-rag
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
attune_rag-0.2.0-py3-none-any.whl -
Subject digest:
a8c1b46e18c3336a654cf3609d628fcf2087d4e00459a22ad6ede3c4d88e33ae - Sigstore transparency entry: 1627656312
- Sigstore integration time:
-
Permalink:
Smart-AI-Memory/attune-rag@81a54a560416e7262f182285823f20ffd6c8e634 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/Smart-AI-Memory
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@81a54a560416e7262f182285823f20ffd6c8e634 -
Trigger Event:
release
-
Statement type: