Abstention-Aware RAG Decision Layer — answer, clarify, or abstain
Project description
A2RAG - Abstention-Aware RAG Decision Layer
Decides when your RAG system should answer, ask for clarification, or abstain.
Standard RAG systems answer every question - even when they shouldn't. A2RAG adds a decision layer that prevents unsafe or hallucinated answers.
The Problem
User: "Can I return this item?"
RAG: "Yes, returns are accepted within 14 days." - confident but WRONG
(user has a digital item - not returnable)
The Solution
User: "Can I return this item?"
A2RAG: CLARIFY - "Was this a physical product or a digital item?"
Installation
pip install a2rag
Zero required dependencies. Works with any RAG system and any LLM.
Quick Start
from a2rag import A2RAGClient
client = A2RAGClient(api_key="your_key_here")
# Step 1: Your existing RAG pipeline (unchanged)
contexts = your_rag.retrieve(user_query)
draft_answer = your_llm.generate(user_query, contexts)
# Step 2: A2RAG decides what to do
decision = client.decide(
query=user_query,
contexts=contexts,
draft_answer=draft_answer,
)
# Step 3: Act on the decision
if decision.should_answer:
show_to_user(draft_answer) # safe to show
elif decision.should_clarify:
ask_user(decision.clarification) # ask specific follow-up
elif decision.should_abstain:
escalate_to_human() # route to human agent
How It Works
A2RAG uses two independent scores to make every decision:
| Score | Question |
|---|---|
| Evidence Score | Does the knowledge base actually support this answer? |
| Completeness Score | Did the user provide enough context for a specific answer? |
This separation prevents a common failure mode: high retrieval confidence on a question the corpus doesn't actually cover.
Decision Object
decision.action # "answer" | "clarify" | "abstain"
decision.confidence # 0.0 - 1.0
decision.clarification # Specific question to ask (if action="clarify")
decision.missing_fields # What information is missing
decision.should_answer # bool shortcut
decision.should_clarify # bool shortcut
decision.should_abstain # bool shortcut
decision.evidence_score # How well corpus supports the answer
decision.query_type # "generic_policy" | "instance_specific"
decision.is_high_confidence # True when confidence >= 0.80
Benchmark Results
Tested on a controlled benchmark of 40 scenarios across 6 domains and 5 languages (EN, HE, AR, FR, ES):
| System | UAR (↓ lower is better) | Safe Answers | Abstain Precision |
|---|---|---|---|
| Standard RAG | 80% | 20% | 0% |
| RAG + confidence threshold | 80% | 20% | 0% |
| A2RAG | 0% | 91% | 100% |
UAR (Unsafe Answer Rate): percentage of answers that were factually wrong or unsupported. A2RAG achieves 0% UAR - it never confidently answers a question it cannot support.
Metrics & Analytics
All metrics are computed from local storage - your data never leaves your machine.
m = client.metrics(days=30)
print(f"Answer rate: {m.answer_rate:.1%}") # % of queries answered
print(f"UAR: {m.uar:.1%}") # Unsafe Answer Rate (0% = perfect)
print(f"ORS: {m.ors:.1%}") # Overall Reliability Score
print(f"Avg latency: {m.avg_latency_ms:.0f}ms")
print(f"Avg confidence: {m.avg_confidence:.1%}")
# Break down by domain or language
by_domain = client.metrics_by_domain()
by_language = client.metrics_by_language()
# Trend over time
trends = client.trends(days=30, interval="day")
Key Metrics Explained
| Metric | What It Measures | Good Value |
|---|---|---|
| UAR | Unsafe Answer Rate — wrong answers shown to users | < 5% |
| ORS | Overall Reliability Score — combined quality metric | > 70% |
| AbstainPrecision | When we refuse to answer, are we right? | > 90% |
| Coverage | % of queries that receive an answer | > 50% |
Local Dashboard
client.dashboard() # opens browser at http://localhost:7860
Or from terminal:
a2rag dashboard
Shows answer/clarify/abstain rates, trends, latency, and confidence distribution — all from local data, nothing sent externally.
Calibration
Find optimal thresholds for your specific domain and corpus:
labeled_data = [
{
"query": "What is the refund window?",
"contexts": ["Refunds available within 14 days for unused items."],
"draft_answer": "14 days.",
"label": "answer", # answer | clarify | abstain
},
# ... 50+ examples recommended
]
result = client.calibrate(labeled_data, domain="insurance")
print(f"tau_evidence: {result.tau_evidence}")
print(f"Expected accuracy: {result.expected_accuracy:.1%}")
print(f"Expected UAR: {result.expected_uar:.1%}")
Supported Context Formats
Works with any RAG output format - no changes to your pipeline:
# Plain strings
contexts = ["Policy text here..."]
# Dicts
contexts = [{"text": "...", "score": 0.9, "source": "doc1.pdf"}]
# LangChain Documents
from langchain.schema import Document
contexts = [Document(page_content="...", metadata={"source": "doc1"})]
# LlamaIndex Nodes - works automatically
contexts = [node]
# Custom objects with .text attribute
contexts = [my_chunk]
Supported Languages
Language is auto-detected. No configuration needed.
English, Arabic, French, Spanish, and more, and more.
Domain Profiles
Pre-configured thresholds per domain:
decision = client.decide(query, contexts, draft, domain="insurance")
# Options: insurance | legal | medical | support | hr | generic
| Domain | Risk Tolerance | Typical Use Case |
|---|---|---|
insurance |
Conservative | Claims, policy questions |
legal |
Very conservative | Contracts, compliance |
medical |
Very conservative | Clinical information |
support |
Moderate | Customer service |
hr |
Moderate | Employee policies |
generic |
Balanced | General purpose |
Privacy
| Data | Stored | Sent to A2RAG? |
|---|---|---|
| Query content | Never | Never |
| Retrieved contexts | Never | Never |
| Draft answers | Never | Never |
| Decision metadata | ~/.a2rag/decisions.db (local only) |
Free tier: anonymous only |
| Feedback & comments | Local only | Never |
# Disable all telemetry (paid plans)
client = A2RAGClient(api_key="...", telemetry=False)
Getting Started
- Contact stav@aibee.co.il to request an API key
pip install a2rag- Free tier: 1,000 requests/month - no credit card required
Status: Private Beta
License
MIT License
Copyright (c) 2026 Stav Vaknin - aibee.co.il
Permission is hereby granted, free of charge, to any person obtaining a copy of this software to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, subject to the standard MIT terms.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file a2rag-0.2.2.tar.gz.
File metadata
- Download URL: a2rag-0.2.2.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dd2db5f2ae21b30c2f271d716c9fa03df18acbb80ff8f76c1f121f6fa2542003
|
|
| MD5 |
e829dc561c325949cd31b995eb78c93e
|
|
| BLAKE2b-256 |
e540ce7aa9af6bf1ccfd31cf986630e3e687cf7e3bfd80b4f2f31282a9038154
|
File details
Details for the file a2rag-0.2.2-py3-none-any.whl.
File metadata
- Download URL: a2rag-0.2.2-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d8abab81c76682e8d9ff089fef81840e6e22d813c4139fb5fb4c1b13aa9d121
|
|
| MD5 |
20d7efea335dc1a63ddedf5f3de1f71e
|
|
| BLAKE2b-256 |
6bd838bcc8f3efb1a2f255939e4d4723ec6516d6a4d8c49fecd0c5319aa054b7
|