Abstention-Aware RAG Decision Layer — answer, clarify, or abstain

These details have not been verified by PyPI

Project links

Project description

A2RAG - Abstention-Aware RAG Decision Layer

Decides when your RAG system should answer, ask for clarification, or abstain.

Standard RAG systems answer every question - even when they shouldn't. A2RAG adds a decision layer that prevents unsafe or hallucinated answers.

The Problem

User: "Can I return this item?"
RAG:  "Yes, returns are accepted within 14 days."  - confident but WRONG
                                                      (user has a digital item - not returnable)

The Solution

User: "Can I return this item?"
A2RAG: CLARIFY - "Was this a physical product or a digital item?"

Installation

pip install a2rag

Zero required dependencies. Works with any RAG system and any LLM.

Quick Start

from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key_here")

# Step 1: Your existing RAG pipeline (unchanged)
contexts     = your_rag.retrieve(user_query)
draft_answer = your_llm.generate(user_query, contexts)

# Step 2: A2RAG decides what to do
decision = client.decide(
    query=user_query,
    contexts=contexts,
    draft_answer=draft_answer,
)

# Step 3: Act on the decision
if decision.should_answer:
    show_to_user(draft_answer)          # safe to show
elif decision.should_clarify:
    ask_user(decision.clarification)    # ask specific follow-up
elif decision.should_abstain:
    escalate_to_human()                 # route to human agent

How It Works

A2RAG uses two independent scores to make every decision:

Score	Question
Evidence Score	Does the knowledge base actually support this answer?
Completeness Score	Did the user provide enough context for a specific answer?

This separation prevents a common failure mode: high retrieval confidence on a question the corpus doesn't actually cover.

Decision Object

decision.action             # "answer" | "clarify" | "abstain"
decision.confidence         # 0.0 - 1.0
decision.clarification      # Specific question to ask (if action="clarify")
decision.missing_fields     # What information is missing
decision.should_answer      # bool shortcut
decision.should_clarify     # bool shortcut
decision.should_abstain     # bool shortcut
decision.evidence_score     # How well corpus supports the answer
decision.query_type         # "generic_policy" | "instance_specific"
decision.is_high_confidence # True when confidence >= 0.80

Benchmark Results

Tested on a controlled benchmark of 40 scenarios across 6 domains and 5 languages (EN, HE, AR, FR, ES):

System	UAR (↓ lower is better)	Safe Answers	Abstain Precision
Standard RAG	80%	20%	0%
RAG + confidence threshold	80%	20%	0%
A2RAG	0%	91%	100%

UAR (Unsafe Answer Rate): percentage of answers that were factually wrong or unsupported. A2RAG achieves 0% UAR - it never confidently answers a question it cannot support.

Metrics & Analytics

All metrics are computed from local storage - your data never leaves your machine.

m = client.metrics(days=30)

print(f"Answer rate:    {m.answer_rate:.1%}")   # % of queries answered
print(f"UAR:            {m.uar:.1%}")            # Unsafe Answer Rate (0% = perfect)
print(f"ORS:            {m.ors:.1%}")            # Overall Reliability Score
print(f"Avg latency:    {m.avg_latency_ms:.0f}ms")
print(f"Avg confidence: {m.avg_confidence:.1%}")

# Break down by domain or language
by_domain   = client.metrics_by_domain()
by_language = client.metrics_by_language()

# Trend over time
trends = client.trends(days=30, interval="day")

Key Metrics Explained

Metric	What It Measures	Good Value
UAR	Unsafe Answer Rate — wrong answers shown to users	< 5%
ORS	Overall Reliability Score — combined quality metric	> 70%
AbstainPrecision	When we refuse to answer, are we right?	> 90%
Coverage	% of queries that receive an answer	> 50%

Local Dashboard

client.dashboard()   # opens browser at http://localhost:7860

Or from terminal:

a2rag dashboard

Shows answer/clarify/abstain rates, trends, latency, and confidence distribution — all from local data, nothing sent externally.

Calibration

Find optimal thresholds for your specific domain and corpus:

labeled_data = [
    {
        "query":        "What is the refund window?",
        "contexts":     ["Refunds available within 14 days for unused items."],
        "draft_answer": "14 days.",
        "label":        "answer",   # answer | clarify | abstain
    },
    # ... 50+ examples recommended
]

result = client.calibrate(labeled_data, domain="insurance")
print(f"tau_evidence:     {result.tau_evidence}")
print(f"Expected accuracy: {result.expected_accuracy:.1%}")
print(f"Expected UAR:      {result.expected_uar:.1%}")

Supported Context Formats

Works with any RAG output format - no changes to your pipeline:

# Plain strings
contexts = ["Policy text here..."]

# Dicts
contexts = [{"text": "...", "score": 0.9, "source": "doc1.pdf"}]

# LangChain Documents
from langchain.schema import Document
contexts = [Document(page_content="...", metadata={"source": "doc1"})]

# LlamaIndex Nodes - works automatically
contexts = [node]

# Custom objects with .text attribute
contexts = [my_chunk]

Supported Languages

Language is auto-detected. No configuration needed.

English, Arabic, French, Spanish, and more, and more.

Domain Profiles

Pre-configured thresholds per domain:

decision = client.decide(query, contexts, draft, domain="insurance")
# Options: insurance | legal | medical | support | hr | generic

Domain	Risk Tolerance	Typical Use Case
`insurance`	Conservative	Claims, policy questions
`legal`	Very conservative	Contracts, compliance
`medical`	Very conservative	Clinical information
`support`	Moderate	Customer service
`hr`	Moderate	Employee policies
`generic`	Balanced	General purpose

Privacy

Data	Stored	Sent to A2RAG?
Query content	Never	Never
Retrieved contexts	Never	Never
Draft answers	Never	Never
Decision metadata	`~/.a2rag/decisions.db` (local only)	Free tier: anonymous only
Feedback & comments	Local only	Never

# Disable all telemetry (paid plans)
client = A2RAGClient(api_key="...", telemetry=False)

Getting Started

Contact stav@aibee.co.il to request an API key
pip install a2rag
Free tier: 1,000 requests/month - no credit card required

Status: Private Beta

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, subject to the standard MIT terms.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.3

May 26, 2026

0.2.2

May 19, 2026

0.2.1

May 19, 2026

0.2.0

May 18, 2026

0.1.3

May 17, 2026

0.1.2

May 17, 2026

0.1.1

May 17, 2026

0.1.0

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2rag-0.2.3.tar.gz (22.1 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

a2rag-0.2.3-py3-none-any.whl (22.5 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file a2rag-0.2.3.tar.gz.

File metadata

Download URL: a2rag-0.2.3.tar.gz
Upload date: May 26, 2026
Size: 22.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`3faa4d29d7bf2cd2861e2bf1b3c1b244ad5da6e48645f43e8ce2d0c31fc9d878`
MD5	`cfd345e7fddbedf8c4e2401674a92d96`
BLAKE2b-256	`7534946c48f2cd36b353068f6693da0f2bd39ab66f2f43fc69bd967c8a4bc1da`

See more details on using hashes here.

File details

Details for the file a2rag-0.2.3-py3-none-any.whl.

File metadata

Download URL: a2rag-0.2.3-py3-none-any.whl
Upload date: May 26, 2026
Size: 22.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ddc5e72b5d9ed19d9140ea27ee7aacba824cc2a76620d01e27f399c2fc0ee315`
MD5	`2422136ba03858c289130ffe752a8800`
BLAKE2b-256	`478675f24f3f4a0070d87e073a99cbacf611c300453e589f975d0aa429209b55`

See more details on using hashes here.

a2rag 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

A2RAG - Abstention-Aware RAG Decision Layer

The Problem

The Solution

Installation

Quick Start

How It Works

Decision Object

Benchmark Results

Metrics & Analytics

Key Metrics Explained

Local Dashboard

Calibration

Supported Context Formats

Supported Languages

Domain Profiles

Privacy

Getting Started

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes