Skip to main content

Abstention-Aware RAG Decision Layer — answer, clarify, or abstain

Project description

A2RAG - Abstention-Aware RAG Decision Layer

PyPI version Python 3.8+ License: MIT

Decides when your RAG system should answer, ask for clarification, or abstain.

Standard RAG systems answer every question - even when they shouldn't. A2RAG adds a decision layer that prevents unsafe or hallucinated answers.


The Problem

User: "Can I return this item?"
RAG:  "Yes, returns are accepted within 14 days."  - confident but WRONG
                                                      (user has a digital item - not returnable)

The Solution

User: "Can I return this item?"
A2RAG: CLARIFY - "Was this a physical product or a digital item?"

Installation

pip install a2rag

Zero required dependencies. Works with any RAG system and any LLM.


Quick Start

from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key_here")

# Step 1: Your existing RAG pipeline (unchanged)
contexts     = your_rag.retrieve(user_query)
draft_answer = your_llm.generate(user_query, contexts)

# Step 2: A2RAG decides what to do
decision = client.decide(
    query=user_query,
    contexts=contexts,
    draft_answer=draft_answer,
)

# Step 3: Act on the decision
if decision.should_answer:
    show_to_user(draft_answer)          # safe to show
elif decision.should_clarify:
    ask_user(decision.clarification)    # ask specific follow-up
elif decision.should_abstain:
    escalate_to_human()                 # route to human agent

How It Works

A2RAG uses two independent scores to make every decision:

Score Question
Evidence Score Does the knowledge base actually support this answer?
Completeness Score Did the user provide enough context for a specific answer?

This separation prevents a common failure mode: high retrieval confidence on a question the corpus doesn't actually cover.


Decision Object

decision.action             # "answer" | "clarify" | "abstain"
decision.confidence         # 0.0 - 1.0
decision.clarification      # Specific question to ask (if action="clarify")
decision.missing_fields     # What information is missing
decision.should_answer      # bool shortcut
decision.should_clarify     # bool shortcut
decision.should_abstain     # bool shortcut
decision.evidence_score     # How well corpus supports the answer
decision.query_type         # "generic_policy" | "instance_specific"
decision.is_high_confidence # True when confidence >= 0.80

Benchmark Results

Tested on a controlled benchmark of 40 scenarios across 6 domains and 5 languages (EN, HE, AR, FR, ES):

System UAR (↓ lower is better) Safe Answers Abstain Precision
Standard RAG 80% 20% 0%
RAG + confidence threshold 80% 20% 0%
A2RAG 0% 91% 100%

UAR (Unsafe Answer Rate): percentage of answers that were factually wrong or unsupported. A2RAG achieves 0% UAR - it never confidently answers a question it cannot support.


Metrics & Analytics

All metrics are computed from local storage - your data never leaves your machine.

m = client.metrics(days=30)

print(f"Answer rate:    {m.answer_rate:.1%}")   # % of queries answered
print(f"UAR:            {m.uar:.1%}")            # Unsafe Answer Rate (0% = perfect)
print(f"ORS:            {m.ors:.1%}")            # Overall Reliability Score
print(f"Avg latency:    {m.avg_latency_ms:.0f}ms")
print(f"Avg confidence: {m.avg_confidence:.1%}")

# Break down by domain or language
by_domain   = client.metrics_by_domain()
by_language = client.metrics_by_language()

# Trend over time
trends = client.trends(days=30, interval="day")

Key Metrics Explained

Metric What It Measures Good Value
UAR Unsafe Answer Rate — wrong answers shown to users < 5%
ORS Overall Reliability Score — combined quality metric > 70%
AbstainPrecision When we refuse to answer, are we right? > 90%
Coverage % of queries that receive an answer > 50%

Local Dashboard

client.dashboard()   # opens browser at http://localhost:7860

Or from terminal:

a2rag dashboard

Shows answer/clarify/abstain rates, trends, latency, and confidence distribution — all from local data, nothing sent externally.


Calibration

Find optimal thresholds for your specific domain and corpus:

labeled_data = [
    {
        "query":        "What is the refund window?",
        "contexts":     ["Refunds available within 14 days for unused items."],
        "draft_answer": "14 days.",
        "label":        "answer",   # answer | clarify | abstain
    },
    # ... 50+ examples recommended
]

result = client.calibrate(labeled_data, domain="insurance")
print(f"tau_evidence:     {result.tau_evidence}")
print(f"Expected accuracy: {result.expected_accuracy:.1%}")
print(f"Expected UAR:      {result.expected_uar:.1%}")

Supported Context Formats

Works with any RAG output format - no changes to your pipeline:

# Plain strings
contexts = ["Policy text here..."]

# Dicts
contexts = [{"text": "...", "score": 0.9, "source": "doc1.pdf"}]

# LangChain Documents
from langchain.schema import Document
contexts = [Document(page_content="...", metadata={"source": "doc1"})]

# LlamaIndex Nodes - works automatically
contexts = [node]

# Custom objects with .text attribute
contexts = [my_chunk]

Supported Languages

Language is auto-detected. No configuration needed.

English, Arabic, French, Spanish, and more, and more.


Domain Profiles

Pre-configured thresholds per domain:

decision = client.decide(query, contexts, draft, domain="insurance")
# Options: insurance | legal | medical | support | hr | generic
Domain Risk Tolerance Typical Use Case
insurance Conservative Claims, policy questions
legal Very conservative Contracts, compliance
medical Very conservative Clinical information
support Moderate Customer service
hr Moderate Employee policies
generic Balanced General purpose

Privacy

Data Stored Sent to A2RAG?
Query content Never Never
Retrieved contexts Never Never
Draft answers Never Never
Decision metadata ~/.a2rag/decisions.db (local only) Free tier: anonymous only
Feedback & comments Local only Never
# Disable all telemetry (paid plans)
client = A2RAGClient(api_key="...", telemetry=False)

Getting Started

  1. Contact stav@aibee.co.il to request an API key
  2. pip install a2rag
  3. Free tier: 1,000 requests/month - no credit card required

Status: Private Beta


License

MIT License

Copyright (c) 2026 Stav Vaknin - aibee.co.il

Permission is hereby granted, free of charge, to any person obtaining a copy of this software to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, subject to the standard MIT terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2rag-0.2.3.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

a2rag-0.2.3-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file a2rag-0.2.3.tar.gz.

File metadata

  • Download URL: a2rag-0.2.3.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3faa4d29d7bf2cd2861e2bf1b3c1b244ad5da6e48645f43e8ce2d0c31fc9d878
MD5 cfd345e7fddbedf8c4e2401674a92d96
BLAKE2b-256 7534946c48f2cd36b353068f6693da0f2bd39ab66f2f43fc69bd967c8a4bc1da

See more details on using hashes here.

File details

Details for the file a2rag-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: a2rag-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ddc5e72b5d9ed19d9140ea27ee7aacba824cc2a76620d01e27f399c2fc0ee315
MD5 2422136ba03858c289130ffe752a8800
BLAKE2b-256 478675f24f3f4a0070d87e073a99cbacf611c300453e589f975d0aa429209b55

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page