Skip to main content

Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.

Project description

attune-rag

Lightweight, LLM-agnostic RAG pipeline with pluggable corpora. Works with Claude, Gemini, or any LLM.

  • No LLM SDK at install time. All provider deps are optional extras. Two required runtime deps: structlog, jinja2.
  • Pluggable corpus. Use attune-help (the default), any markdown directory, or your own CorpusProtocol.
  • Returns a prompt string + citation records by default — pipeline.run() never opens a network connection. You call your own LLM however you like. Optional provider adapters ship convenience wrappers.
  • Optional hybrid retrieval. QueryExpander and LLMReranker layer Claude Haiku on top of keyword retrieval to improve recall and precision — both opt-in, both fail-safe.

Why attune-rag

Most RAG libraries ship features. attune-rag ships measured quality numbers and gates merges against them. The CI badge isn't "tests pass" — it's P@1 ≥ 0.95, R@3 = 1.00, mean faithfulness ≥ 0.9686 (locked at docs/specs/release-quality-baseline/baseline-1.md) plus per-axis CPU + wall-clock perf thresholds (locked at docs/specs/downstream-validation/perf-baseline.md).

A PR that drops mean_faithfulness below 0.9686 fails CI automatically. Same for any latency hot-path regressing past mean + 2σ. That's the differentiator.

vs LangChain / LlamaIndex

attune-rag LangChain LlamaIndex
Required runtime deps 2 many (transitively, ~30+) many (~25+)
LLM SDK at install none bundled bundled
Published quality regression thresholds yes (P@1, R@3, faithfulness) no no
Published perf thresholds (wall + CPU) yes no no
Citation primitives built-in yes add-on add-on
"Get a string back, call your own LLM" default possible w/ effort possible w/ effort

LangChain and LlamaIndex are fantastic frameworks if you want batteries-included orchestration. attune-rag is the alternative when you want a RAG component you can drop into an existing app without buying into a framework — and want the quality bar quantified, not implied.

Beyond drop-in retrieval, attune-rag is the grounding foundation for the attune-* family's content-quality discipline. The attune-author polish/fact-check pipeline uses attune-rag's retrieval + faithfulness primitives to verify generated help content is grounded in source material before it's marked authoritative — the same mean_faithfulness ≥ 0.9686 discipline that gates this library's own benchmarks, extended to the authoring loop.

What attune-rag is not

Honest exclusions, so you can self-disqualify if you need any of these:

  • Not an agent framework. No multi-step chains, no tool-use orchestration, no agent loops.
  • Not a document-parsing toolkit. Bring your markdown already-parsed; use unstructured.io or similar upstream.
  • Not a vector DB integration. Keyword retrieval is the default; you wire your own vector store if you need one (an EmbeddingRetriever is on the post-freeze roadmap — see below).
  • Not a one-line-install batteries-included framework. That's LangChain / LlamaIndex. attune-rag is for the case where that's too much.

Install

pip install attune-rag                     # core only
pip install 'attune-rag[attune-help]'      # + bundled help corpus
pip install 'attune-rag[claude]'           # + Claude adapter
pip install 'attune-rag[gemini]'           # + Gemini adapter
pip install 'attune-rag[all]'              # everything

Quick start — Claude

pip install 'attune-rag[attune-help,claude]'
import asyncio
from attune_rag import RagPipeline

async def main():
    pipeline = RagPipeline()  # defaults to AttuneHelpCorpus
    response, result = await pipeline.run_and_generate(
        "How do I run a security audit with attune?",
        provider="claude",
    )
    print(response)
    print("\nSources:", [h.entry.path for h in result.citation.hits])

asyncio.run(main())

Quick start — Gemini

pip install 'attune-rag[attune-help,gemini]'
response, result = await pipeline.run_and_generate(
    "...", provider="gemini", model="gemini-1.5-pro",
)

Quick start — custom corpus, any LLM

from pathlib import Path
from attune_rag import RagPipeline, DirectoryCorpus

pipeline = RagPipeline(corpus=DirectoryCorpus(Path("./my-docs")))
result = pipeline.run("How do I...?")

# Send result.augmented_prompt to whatever LLM you use.
# The pipeline itself does NOT call an LLM unless you use
# run_and_generate or call a provider adapter yourself.

📖 Building a quality corpus. See docs/USER_CORPUS_GUIDE.md for the corpus-authoring discipline that produced the bundled attune-help corpus's 100% / 100% baseline + 100% paraphrased R@3: frontmatter aliases, multi-token intent, the MIN_ALIAS_OVERLAP knob, stemmer traps, the override file pattern, and the strict-dominance measurement loop. The guide is the v0 forerunner of the v1.0.0 framework framing (user-corpus-onboarding spec).

Hybrid retrieval (optional)

QueryExpander and LLMReranker require the [claude] extra and an ANTHROPIC_API_KEY. Both are opt-in and fail-safe — any API error falls back to keyword-only order automatically.

from attune_rag import RagPipeline, LLMReranker, QueryExpander

# Reranker only (recommended for precision):
pipeline = RagPipeline(reranker=LLMReranker())

# Expander + reranker (max coverage):
pipeline = RagPipeline(
    expander=QueryExpander(),
    reranker=LLMReranker(),
)

Template editor primitives (attune_rag.editor)

Headless toolkit for tools that need to validate, lint, and refactor a template corpus — used by the attune-gui template editor and the attune-author edit CLI, but works standalone with any CorpusProtocol.

API What it does
load_schema() Loads template_schema.json (the v1 frontmatter contract: required type enum + name; optional tags, aliases, summary, source, hash; additionalProperties: true).
parse_frontmatter(text) / validate_frontmatter(data) Split a template into frontmatter + body and report typed FrontmatterIssues — used by linters and editors.
lint_template(text, rel_path, corpus) Returns Diagnostic[] for schema violations, broken [[alias]] references, and depth-marker sequence errors. 1-indexed line/col ranges.
autocomplete_tags(corpus, prefix, limit) / autocomplete_aliases(corpus, prefix, limit) Prefix-match completions ranked by frequency (tags) or lexical proximity (aliases). Sub-ms on 1k templates.
find_references(corpus, name, kind) Locate every alias/tag/path occurrence across body, frontmatter, and cross_links.json.
plan_rename(corpus, old, new, kind) Build a RenamePlan (one FileEdit per affected file with unified-diff hunks) for kind="alias" or "tag". Raises RenameCollisionError on existing alias targets.
apply_rename(corpus, plan) Atomically apply the plan (tempfile-per-file + sequential rename + drift-detection rollback). Returns the list of affected paths.

Schema, lint, and rename are pure functions over CorpusProtocol — no I/O, no global state. All three pieces are tested as a unit and used live by the attune-gui editor's /api/corpus/<id>/lint, /autocomplete, and /refactor/rename/{preview,apply} routes.

from attune_rag import DirectoryCorpus
from attune_rag.editor import lint_template, plan_rename, apply_rename

corpus = DirectoryCorpus(Path("./templates")).load()

# Validate a template before saving
diagnostics = lint_template(
    text=Path("./templates/concepts/foo.md").read_text(),
    rel_path="concepts/foo.md",
    corpus=corpus,
)

# Rename an alias across the whole corpus
plan = plan_rename(corpus, old="oldname", new="newname", kind="alias")
print(f"Affects {len(plan.edits)} files")
affected = apply_rename(corpus, plan)

Dashboard

attune-rag dashboard show    # live terminal dashboard
attune-rag dashboard render --out report.html  # HTML snapshot

Quality baselines

attune-rag locks two baselines, both gated by CI. Thresholds are empirically derived (mean ± 2σ) from back-to-back benchmark runs on an unchanged HEAD — grounded, not guessed.

Retrieval + faithfulness

Metric Threshold (current) Source
precision_at_1 ≥ 0.95 retrieval, deterministic
recall_at_3 = 1.00 retrieval, deterministic
mean_faithfulness ≥ 0.9686 Claude judge, σ ≈ 0.005

Gated by .github/workflows/benchmark.yml. Faithfulness gating engages when the PR touches retrieval, reranker, expander, pipeline, prompts, or eval paths, or when the PR title contains [full-bench]. Methodology + raw numbers in docs/specs/release-quality-baseline/.

Per-hot-path latency

Locked dual-axis (wall-clock + CPU-time) thresholds on the four benchmarks. CPU-time is the gating axis (deterministic); wall-clock is advisory.

Numbers measured under the V2 multi-run methodology (5 invocations × 20 runs = 100 measurements per metric) on the locked-baseline runner (Linux ubuntu-latest, CPython 3.11.15). Inter-run and intra-run variance are tracked separately; thresholds are mean + 2σ × inter_run_stdev. Full 8-row dual-axis table + hardware fingerprint + per-metric noise profile: docs/specs/downstream-validation/perf-baseline.md.

Why two threshold styles in the locked table:

  • keyword_retriever_retrieve has a wider CPU band because measured intra-run variance reflects cold-cache effects on the first few iterations — empirically derived, not tuned for tightness.
  • llm_reranker_rerank is wall-clock-only because Anthropic network variance dominates the CPU axis; the gate is set generously.

Gated by .github/workflows/perf.yml per-PR (blocking on the CPU axis as of W3.1).

Why this is the differentiator

Most RAG libraries A/B-test internally and ship the result. attune-rag publishes the thresholds, gates merges against them, and re-measures whenever the corpus, judge prompt, or hardware changes. The receipts are checked in.

Bundled .help/ corpus

The repo ships a polished .help/ corpus that documents attune-rag's own surface — 143 templates across 13 features × 11 kinds (concept, task, reference, quickstart, faq, error, warning, tip, note, comparison, troubleshooting). Generated by attune-author with strict fact-check; queryable via AttuneHelpCorpus or as the bundled default for RagPipeline(). See .help/features.yaml for the feature map and .help/templates/ for the content.

The 13 features: pipeline, retrieval, corpus, prompts, provenance, providers, eval, benchmark, cli, editor, dashboard, expander, reranker.

What faithfulness measures

Faithfulness scores how well an answer is grounded in the retrieved passages1.0 means every claim in the answer is supported by a cited source; lower scores mean some claims have no support in the context. It catches hallucination in a way that precision_at_k and recall_at_k can't: those only measure whether the right documents were retrieved, not whether the generated answer actually used them.

attune-rag uses Claude as the judge via Anthropic's tool-use API to produce a structured score in [0.0, 1.0] for each (query, answer, retrieved_context) triple. The reported metric is the mean over the golden query set. Aggregate σ ≈ 0.005 over 40 queries even though per-query judge non-determinism can swing 40+ percentage points on individual queries — averaging absorbs the noise.

The same discipline powers attune-author's polish/fact-check pipeline — generated help content is scored against retrieved source passages before being marked authoritative. attune-rag's faithfulness primitives aren't just instrumentation; they're the contract the family's content-quality story is built on.

Run faithfulness manually

pip install 'attune-rag[claude]'
export ANTHROPIC_API_KEY=sk-ant-...

# Retrieval metrics only (free, deterministic):
attune-rag-benchmark --queries queries.yaml --json out.json

# Add faithfulness (~1 Claude API call per query, costs tokens):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --json out.json

# Compare extended-thinking on vs off (2× judge cost):
attune-rag-benchmark --queries queries.yaml --with-faithfulness --compare-thinking --json out.json

The judge implementation lives at attune_rag.eval.faithfulness.FaithfulnessJudge. Note: attune_rag.eval.* is currently INTERNAL and may move — the attune-rag-benchmark --with-faithfulness CLI is the stable contract.

For the methodology behind the 0.9686 threshold, the v1/v2 ground-truth calibration runs, and the extended-thinking-vs-default decision record, see docs/rag/faithfulness-thinking-calibration.md.

Roadmap — embeddings (post-freeze 0.2.0+)

Keyword retrieval + optional Claude reranker currently meet the locked P@1 ≥ 0.95, R@3 = 1.00 thresholds against the attune-help golden set. The remaining hard queries (3 of 28, currently xpass-gated under [no-embeddings]) have zero token overlap against their target doc (e.g. "vulnerability scan" → tool-security-audit.md). Closing that gap needs vector search.

The plan is to ship attune-rag[embeddings] using fastembed for local, CPU-only embeddings — no new network dependency, no API key required at retrieval time. Keyword retrieval stays the default; embeddings layer in opt-in, same shape as QueryExpander and LLMReranker. With 0.2.0 cut, embeddings are a Phase 5 candidate — see docs/specs/ROADMAP-v1.md.

See CHANGELOG.md for the decision record and remaining-gap analysis.

Prompt caching (Claude only)

When using the Claude provider, run_and_generate automatically enables Anthropic prompt caching on the stable RAG context prefix (≥ 1 024 chars). This eliminates repeated token costs on the corpus portion of the prompt when the same context block is reused across calls.

No configuration needed — the provider handles the cache_control header automatically.

Public API

attune-rag's public surface is documented below and snapshot-tested in tests/unit/test_api_surface.py. Formal SemVer commitments are in effect as of 0.2.0 — see docs/POLICY.md for the deprecation policy. Symbols PUBLIC in 0.2.x stay PUBLIC through every 0.2.z; the snapshot test catches drift.

Top-level (from attune_rag import ...):

  • Pipeline — RagPipeline, RagResult
  • Corpus — CorpusProtocol, RetrievalEntry, DirectoryCorpus, AttuneHelpCorpus
  • Retrieval — KeywordRetriever, RetrievalHit, RetrieverProtocol
  • Provenance — CitationRecord, CitedSource, ClaimCitation, format_citations_markdown, format_claim_citations_markdown
  • Prompting — build_augmented_prompt, PROMPT_VARIANTS
  • Hybrid retrieval — QueryExpander, LLMReranker

PUBLIC submodules (importable by qualified path):

  • attune_rag.corpus — exposes AliasInfo, DuplicateAliasError, load_aliases_from_file in addition to the top-level corpus names
  • attune_rag.corpus.attune_helpAttuneHelpCorpus
  • attune_rag.corpus.help_adapterHelpCorpusAdapter Protocol
  • attune_rag.providersLLMProvider, get_provider, list_available
  • attune_rag.measure_corpusmeasure(...) function + MeasureResult dataclass for scoring a corpus against a query set. CLI via python -m attune_rag.measure_corpus ... or the attune-rag-measure console script. See docs/USER_CORPUS_GUIDE.md §6 for the worked example.
  • attune_rag.editor — template-editor primitives (lint, schema, rename, autocomplete, references); see "Template editor primitives" above for the symbol list
  • attune_rag.editor.{rename,schema,lint,autocomplete,references} — the individual editor submodules

Console scripts:

  • attune-rag — CLI entry point (attune_rag.cli:main)
  • attune-rag-measure — quality measurement (attune_rag.measure_corpus:main); CI-suitable via --watermark-r3 (non-zero exit on fail)

Anything not listed above is INTERNAL and may change in any release. The underscore-prefixed editor modules (attune_rag.editor._rename etc.) shipped in 0.1.x are deprecation shims as of 0.2.0; they re-export the new non-underscore names and emit DeprecationWarning. They are removed in 0.3.0.

Status

0.2.0 — first SemVer-binding cut. Phase 4 of the v1.0 roadmap landed cleanly: quality baselines (P@1 ≥ 0.95, R@3 = 1.00, mean faithfulness ≥ 0.9686) hold; per-hot-path perf thresholds re-locked under the V2 multi-run methodology (5 × 20 measurements); attune-gui downstream blocking gate stayed green throughout. From 0.2.0 forward, docs/POLICY.md §2 binds — symbols PUBLIC in 0.2.x stay PUBLIC through every 0.2.z.

We hit our Phase 4 goals ~3 weeks ahead of the nominal calendar and opted to ship early via the freeze-override mechanism rather than let the cadence clock run out — getting the user-facing additions (attune-rag-measure console script + attune_rag.measure_corpus module for benchmarking your own corpus quality; load_aliases_from_file() for file-based alias customization) into your hands sooner. Override rationale + per-PR receipts at docs/specs/api-v0.2.0-cut/.

Classifier stays at 3 - Alpha — the Production/Stable flip is a Phase 5 deliverable.

Part of the attune ecosystem (attune-ai, attune-help, attune-author, attune-gui).

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

attune_rag-0.2.0.tar.gz (102.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

attune_rag-0.2.0-py3-none-any.whl (114.1 kB view details)

Uploaded Python 3

File details

Details for the file attune_rag-0.2.0.tar.gz.

File metadata

  • Download URL: attune_rag-0.2.0.tar.gz
  • Upload date:
  • Size: 102.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for attune_rag-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d8d5f26bfa4b2379845492d2c36d29359e09ef8c7d03f47435cd06baacba4711
MD5 a58a3fc325afd6532801adc8f436625a
BLAKE2b-256 82da5de2d3a5e6f9f1830c19998b504f4fbd8e24d77e2a32709403389166bb30

See more details on using hashes here.

Provenance

The following attestation bundles were made for attune_rag-0.2.0.tar.gz:

Publisher: publish.yml on Smart-AI-Memory/attune-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file attune_rag-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: attune_rag-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 114.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for attune_rag-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a8c1b46e18c3336a654cf3609d628fcf2087d4e00459a22ad6ede3c4d88e33ae
MD5 49aa8da3a58270ac47e5bcb68da9c58e
BLAKE2b-256 ed49005cf1581d99227c15bf4afddfa115cada1d666d9836296fd28f3619de29

See more details on using hashes here.

Provenance

The following attestation bundles were made for attune_rag-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Smart-AI-Memory/attune-rag

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page