Skip to main content

Abstention-Aware RAG Decision Layer — answer, clarify, or abstain

Project description

A2RAG - Abstention-Aware RAG Decision Layer

PyPI version Python 3.8+ License: MIT

Decides when your RAG system should answer, ask for clarification, or abstain.

Standard RAG systems answer every question - even when they shouldn't. A2RAG adds a decision layer that prevents unsafe or hallucinated answers.


The Problem

User: "Can I return this item?"
RAG:  "Yes, returns are accepted within 14 days."  - confident but WRONG
                                                      (user has a digital item - not returnable)

The Solution

User: "Can I return this item?"
A2RAG: CLARIFY - "Was this a physical product or a digital item?"

Installation

pip install a2rag

Zero required dependencies. Works with any RAG system and any LLM.


Quick Start

from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key_here")

# Step 1: Your existing RAG pipeline (unchanged)
contexts     = your_rag.retrieve(user_query)
draft_answer = your_llm.generate(user_query, contexts)

# Step 2: A2RAG decides what to do
decision = client.decide(
    query=user_query,
    contexts=contexts,
    draft_answer=draft_answer,
)

# Step 3: Act on the decision
if decision.should_answer:
    show_to_user(draft_answer)          # safe to show
elif decision.should_clarify:
    ask_user(decision.clarification)    # ask specific follow-up
elif decision.should_abstain:
    escalate_to_human()                 # route to human agent

How It Works

A2RAG uses two independent scores to make every decision:

Score Question
Evidence Score Does the knowledge base actually support this answer?
Completeness Score Did the user provide enough context for a specific answer?

This separation prevents a common failure mode: high retrieval confidence on a question the corpus doesn't actually cover.


Decision Object

decision.action             # "answer" | "clarify" | "abstain"
decision.confidence         # 0.0 - 1.0
decision.clarification      # Specific question to ask (if action="clarify")
decision.missing_fields     # What information is missing
decision.should_answer      # bool shortcut
decision.should_clarify     # bool shortcut
decision.should_abstain     # bool shortcut
decision.evidence_score     # How well corpus supports the answer
decision.query_type         # "generic_policy" | "instance_specific"
decision.is_high_confidence # True when confidence >= 0.80

Benchmark Results

Tested on a controlled benchmark of 40 scenarios across 6 domains and 5 languages (EN, HE, AR, FR, ES):

System UAR (↓ lower is better) Safe Answers Abstain Precision
Standard RAG 80% 20% 0%
RAG + confidence threshold 80% 20% 0%
A2RAG 0% 91% 100%

UAR (Unsafe Answer Rate): percentage of answers that were factually wrong or unsupported. A2RAG achieves 0% UAR - it never confidently answers a question it cannot support.


Metrics & Analytics

All metrics are computed from local storage - your data never leaves your machine.

m = client.metrics(days=30)

print(f"Answer rate:    {m.answer_rate:.1%}")   # % of queries answered
print(f"UAR:            {m.uar:.1%}")            # Unsafe Answer Rate (0% = perfect)
print(f"ORS:            {m.ors:.1%}")            # Overall Reliability Score
print(f"Avg latency:    {m.avg_latency_ms:.0f}ms")
print(f"Avg confidence: {m.avg_confidence:.1%}")

# Break down by domain or language
by_domain   = client.metrics_by_domain()
by_language = client.metrics_by_language()

# Trend over time
trends = client.trends(days=30, interval="day")

Key Metrics Explained

Metric What It Measures Good Value
UAR Unsafe Answer Rate — wrong answers shown to users < 5%
ORS Overall Reliability Score — combined quality metric > 70%
AbstainPrecision When we refuse to answer, are we right? > 90%
Coverage % of queries that receive an answer > 50%

Local Dashboard

client.dashboard()   # opens browser at http://localhost:7860

Or from terminal:

a2rag dashboard

Shows answer/clarify/abstain rates, trends, latency, and confidence distribution — all from local data, nothing sent externally.


Calibration

Find optimal thresholds for your specific domain and corpus:

labeled_data = [
    {
        "query":        "What is the refund window?",
        "contexts":     ["Refunds available within 14 days for unused items."],
        "draft_answer": "14 days.",
        "label":        "answer",   # answer | clarify | abstain
    },
    # ... 50+ examples recommended
]

result = client.calibrate(labeled_data, domain="insurance")
print(f"tau_evidence:     {result.tau_evidence}")
print(f"Expected accuracy: {result.expected_accuracy:.1%}")
print(f"Expected UAR:      {result.expected_uar:.1%}")

Supported Context Formats

Works with any RAG output format - no changes to your pipeline:

# Plain strings
contexts = ["Policy text here..."]

# Dicts
contexts = [{"text": "...", "score": 0.9, "source": "doc1.pdf"}]

# LangChain Documents
from langchain.schema import Document
contexts = [Document(page_content="...", metadata={"source": "doc1"})]

# LlamaIndex Nodes - works automatically
contexts = [node]

# Custom objects with .text attribute
contexts = [my_chunk]

Supported Languages

Language is auto-detected. No configuration needed.

English, Arabic, French, Spanish, and more, and more.


Domain Profiles

Pre-configured thresholds per domain:

decision = client.decide(query, contexts, draft, domain="insurance")
# Options: insurance | legal | medical | support | hr | generic
Domain Risk Tolerance Typical Use Case
insurance Conservative Claims, policy questions
legal Very conservative Contracts, compliance
medical Very conservative Clinical information
support Moderate Customer service
hr Moderate Employee policies
generic Balanced General purpose

Privacy

Data Stored Sent to A2RAG?
Query content Never Never
Retrieved contexts Never Never
Draft answers Never Never
Decision metadata ~/.a2rag/decisions.db (local only) Free tier: anonymous only
Feedback & comments Local only Never
# Disable all telemetry (paid plans)
client = A2RAGClient(api_key="...", telemetry=False)

Getting Started

  1. Contact stav@aibee.co.il to request an API key
  2. pip install a2rag
  3. Free tier: 1,000 requests/month - no credit card required

Status: Private Beta


License

MIT License

Copyright (c) 2026 Stav Vaknin - aibee.co.il

Permission is hereby granted, free of charge, to any person obtaining a copy of this software to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, subject to the standard MIT terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2rag-0.2.2.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

a2rag-0.2.2-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file a2rag-0.2.2.tar.gz.

File metadata

  • Download URL: a2rag-0.2.2.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.2.tar.gz
Algorithm Hash digest
SHA256 dd2db5f2ae21b30c2f271d716c9fa03df18acbb80ff8f76c1f121f6fa2542003
MD5 e829dc561c325949cd31b995eb78c93e
BLAKE2b-256 e540ce7aa9af6bf1ccfd31cf986630e3e687cf7e3bfd80b4f2f31282a9038154

See more details on using hashes here.

File details

Details for the file a2rag-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: a2rag-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for a2rag-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4d8abab81c76682e8d9ff089fef81840e6e22d813c4139fb5fb4c1b13aa9d121
MD5 20d7efea335dc1a63ddedf5f3de1f71e
BLAKE2b-256 6bd838bcc8f3efb1a2f255939e4d4723ec6516d6a4d8c49fecd0c5319aa054b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page