Skip to main content

Abstention-Aware RAG Decision Layer — answer, clarify, or abstain

Project description

A2RAG - Abstention-Aware RAG Decision Layer

PyPI version Python 3.8+ License: MIT

Decides when your RAG system should answer, ask for clarification, or abstain.

Standard RAG systems answer every question - even when they shouldn't. A2RAG adds a decision layer that prevents unsafe or hallucinated answers.


The Problem

User: "Can I return this item?"
RAG:  "Yes, returns are accepted within 14 days."  - confident but WRONG
                                                      (user has a digital item - not returnable)

The Solution

User: "Can I return this item?"
A2RAG: CLARIFY - "Was this a physical product or a digital item?"

Installation

pip install a2rag

Zero required dependencies. Works with any RAG system and any LLM.


Quick Start

from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key_here")

# Step 1: Your existing RAG pipeline (unchanged)
contexts     = your_rag.retrieve(user_query)
draft_answer = your_llm.generate(user_query, contexts)

# Step 2: A2RAG decides what to do
decision = client.decide(
    query=user_query,
    contexts=contexts,
    draft_answer=draft_answer,
)

# Step 3: Act on the decision
if decision.should_answer:
    show_to_user(draft_answer)          # safe to show
elif decision.should_clarify:
    ask_user(decision.clarification)    # ask specific follow-up
elif decision.should_abstain:
    escalate_to_human()                 # route to human agent

How It Works

A2RAG uses two independent scores to make every decision:

Score Question
Evidence Score Does the knowledge base actually support this answer?
Completeness Score Did the user provide enough context for a specific answer?

This separation prevents a common failure mode: high retrieval confidence on a question the corpus doesn't actually cover.


Decision Object

decision.action             # "answer" | "clarify" | "abstain"
decision.confidence         # 0.0 - 1.0
decision.clarification      # Specific question to ask (if action="clarify")
decision.missing_fields     # What information is missing
decision.should_answer      # bool shortcut
decision.should_clarify     # bool shortcut
decision.should_abstain     # bool shortcut
decision.evidence_score     # How well corpus supports the answer
decision.query_type         # "generic_policy" | "instance_specific"
decision.is_high_confidence # True when confidence >= 0.80

Benchmark Results

Tested on a controlled benchmark of 40 scenarios across 6 domains and 5 languages (EN, HE, AR, FR, ES):

System UAR (↓ lower is better) Safe Answers Abstain Precision
Standard RAG 80% 20% 0%
RAG + confidence threshold 80% 20% 0%
A2RAG 0% 91% 100%

UAR (Unsafe Answer Rate): percentage of answers that were factually wrong or unsupported. A2RAG achieves 0% UAR - it never confidently answers a question it cannot support.


Metrics & Analytics

All metrics are computed from local storage - your data never leaves your machine.

m = client.metrics(days=30)

print(f"Answer rate:    {m.answer_rate:.1%}")   # % of queries answered
print(f"UAR:            {m.uar:.1%}")            # Unsafe Answer Rate (0% = perfect)
print(f"ORS:            {m.ors:.1%}")            # Overall Reliability Score
print(f"Avg latency:    {m.avg_latency_ms:.0f}ms")
print(f"Avg confidence: {m.avg_confidence:.1%}")

# Break down by domain or language
by_domain   = client.metrics_by_domain()
by_language = client.metrics_by_language()

# Trend over time
trends = client.trends(days=30, interval="day")

Key Metrics Explained

Metric What It Measures Good Value
UAR Unsafe Answer Rate — wrong answers shown to users < 5%
ORS Overall Reliability Score — combined quality metric > 70%
AbstainPrecision When we refuse to answer, are we right? > 90%
Coverage % of queries that receive an answer > 50%

Local Dashboard

client.dashboard()   # opens browser at http://localhost:7860

Or from terminal:

a2rag dashboard

Shows answer/clarify/abstain rates, trends, latency, and confidence distribution — all from local data, nothing sent externally.


Calibration

Find optimal thresholds for your specific domain and corpus:

labeled_data = [
    {
        "query":        "What is the refund window?",
        "contexts":     ["Refunds available within 14 days for unused items."],
        "draft_answer": "14 days.",
        "label":        "answer",   # answer | clarify | abstain
    },
    # ... 50+ examples recommended
]

result = client.calibrate(labeled_data, domain="insurance")
print(f"tau_evidence:     {result.tau_evidence}")
print(f"Expected accuracy: {result.expected_accuracy:.1%}")
print(f"Expected UAR:      {result.expected_uar:.1%}")

Supported Context Formats

Works with any RAG output format - no changes to your pipeline:

# Plain strings
contexts = ["Policy text here..."]

# Dicts
contexts = [{"text": "...", "score": 0.9, "source": "doc1.pdf"}]

# LangChain Documents
from langchain.schema import Document
contexts = [Document(page_content="...", metadata={"source": "doc1"})]

# LlamaIndex Nodes - works automatically
contexts = [node]

# Custom objects with .text attribute
contexts = [my_chunk]

Supported Languages

Language is auto-detected. No configuration needed.

English, Arabic, French, Spanish, and more, and more.


Domain Profiles

Pre-configured thresholds per domain:

decision = client.decide(query, contexts, draft, domain="insurance")
# Options: insurance | legal | medical | support | hr | generic
Domain Risk Tolerance Typical Use Case
insurance Conservative Claims, policy questions
legal Very conservative Contracts, compliance
medical Very conservative Clinical information
support Moderate Customer service
hr Moderate Employee policies
generic Balanced General purpose

Privacy

Data Stored Sent to A2RAG?
Query content Never Never
Retrieved contexts Never Never
Draft answers Never Never
Decision metadata ~/.a2rag/decisions.db (local only) Free tier: anonymous only
Feedback & comments Local only Never
# Disable all telemetry (paid plans)
client = A2RAGClient(api_key="...", telemetry=False)

Getting Started

  1. Contact stav@aibee.co.il to request an API key
  2. pip install a2rag
  3. Free tier: 1,000 requests/month - no credit card required

Status: Private Beta


License

MIT License

Copyright (c) 2026 Stav Vaknin - aibee.co.il

Permission is hereby granted, free of charge, to any person obtaining a copy of this software to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, subject to the standard MIT terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2rag-0.2.0.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

a2rag-0.2.0-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file a2rag-0.2.0.tar.gz.

File metadata

  • Download URL: a2rag-0.2.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.0.tar.gz
Algorithm Hash digest
SHA256 214c2c80a67fd2915bbc7332bab4b11a90957051c1491239aabc4eaa7ddf9516
MD5 a0818ad2e2e7510570f0d9becacdaacb
BLAKE2b-256 2cac54407a9edb3b1e5291dfbeb5787bf90f818f6da1ff9405b053aaec450e88

See more details on using hashes here.

File details

Details for the file a2rag-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: a2rag-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cf680586071b7f96746c901cd2b1e4079cc00db6e3189d21579779b8d17466c6
MD5 5dd2dcdb235b8ff4664d6a3b7da48724
BLAKE2b-256 da7c9a5781575d6bc1bd038ee014222fd6793145e11deb979c6eed38d9e4d011

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page