Skip to main content

Framework-agnostic confidence-gated escalation middleware for LLM agents: multi-signal scoring (logprob, verbalized, tool risk), threshold policies, and escalation handlers for LangChain, CrewAI, AutoGen, and Google ADK.

Project description

confidence-escalation

Framework-agnostic confidence-gated escalation middleware for LLM agents.

PyPI version Python 3.9+ License: MIT

Multi-signal confidence scoring (logprob + verbalized + ASR + tool risk) with threshold-based escalation policies and pluggable handlers. Works with LangChain, LangGraph, CrewAI, AutoGen, Google ADK, and any Python agent framework.

Addresses OWASP Agentic AI Top 10 ASI-09: Human-Agent Trust Exploitation — prevents agents from taking high-stakes actions when confidence is insufficient.


The Problem

LLM agents fail silently. When an agent is uncertain, it still returns a response — often confidently-worded — with no mechanism to:

  • Detect that confidence is low before executing a high-risk tool call
  • Route uncertain responses to a human reviewer
  • Escalate to a stronger model when needed
  • Produce a compliance audit trail of every escalation event

confidence-escalation solves all four.


Features

  • Multi-signal scoring — combine logprobs, verbalized confidence, and tool-call risk into a single composite score
  • Threshold policies — single-threshold, dual-threshold (normal + critical), composite multi-policy chains
  • Pluggable handlers — human-in-loop, model upgrade, tool restriction, compliance logging
  • Framework adapters — LangChain callbacks, CrewAI step_callback, AutoGen reply function wrapper, Google ADK event interceptor
  • EU AI Act Article 12 audit logging — structured JSON compliance log on every escalation
  • Zero required dependencies — core library runs with no dependencies; framework integrations are optional extras

Quick Start

Installation

pip install confidence-escalation
# With LangChain:
pip install "confidence-escalation[langchain]"
# With all frameworks:
pip install "confidence-escalation[all]"

Basic Scoring

from confidence_escalation import MultiSignalConfidenceScorer

scorer = MultiSignalConfidenceScorer(
    weights={"logprob": 0.5, "verbalized": 0.3, "tool_risk": -0.2}
)

score = scorer.score(
    logprobs=[-0.1, -0.3, -0.2],
    verbalized_response="I am 70% confident about this answer.",
    tool_call_risk=0.15,
)

print(f"Confidence: {score.value:.3f}")   # e.g. 0.712
print(f"Reliable: {score.is_reliable()}")  # True (above 0.6 default)

Threshold Policy + Human-in-Loop

from confidence_escalation import (
    ThresholdPolicy,
    EscalationAction,
    HumanInLoopHandler,
    ComplianceLoggingHandler,
    ConfidenceEscalationMiddleware,
)

def notify_human(ctx, result):
    print(f"Routing to human review: session={ctx['session_id']}, confidence={result.confidence_score:.3f}")

policy = ThresholdPolicy(
    threshold=0.65,
    action=EscalationAction.HUMAN_IN_LOOP,
    critical_threshold=0.3,
    critical_action=EscalationAction.ABORT,
)

middleware = ConfidenceEscalationMiddleware(
    policy=policy,
    handlers=[
        HumanInLoopHandler(callback=notify_human),
        ComplianceLoggingHandler(),
    ],
)

result = middleware.call(
    agent_step=lambda: my_llm.invoke(messages),
    context={"session_id": "abc123", "model": "claude-sonnet-4-6"},
    logprobs=[-0.4, -0.5],
)

if result["escalation"]["triggered"]:
    print("Escalated — stopping agent execution.")

Model Upgrade Handler

from confidence_escalation import ModelUpgradeHandler, ThresholdPolicy, EscalationAction

handler = ModelUpgradeHandler(
    upgrade_map={
        "claude-haiku-4-5": "claude-sonnet-4-6",
        "claude-sonnet-4-6": "claude-opus-4-7",
    }
)

policy = ThresholdPolicy(threshold=0.7, action=EscalationAction.MODEL_UPGRADE)
result = policy.evaluate(score, context={"model": "claude-haiku-4-5"})

if result.triggered:
    upgrade_info = handler.handle(result, context={"model": "claude-haiku-4-5"})
    print(f"Retry with: {upgrade_info['upgraded_model']}")

Tool Restriction

from confidence_escalation import ToolRestrictionHandler, ThresholdPolicy, EscalationAction

handler = ToolRestrictionHandler(
    high_risk_tools=["delete_record", "send_email", "execute_sql"],
    allow_read_only=True,
)

policy = ThresholdPolicy(threshold=0.65, action=EscalationAction.TOOL_RESTRICTION)
result = policy.evaluate(score, context={"available_tools": ["get_customer", "delete_record"]})

if result.triggered:
    restriction = handler.handle(result, context={"available_tools": agent_tools})
    safe_tools = restriction["allowed_tools"]
    # Re-invoke agent with only safe_tools

LangChain Integration

from confidence_escalation.adapters.langchain import LangChainEscalationAdapter
from confidence_escalation.handlers import HumanInLoopHandler

adapter = LangChainEscalationAdapter(
    threshold=0.65,
    handlers=[HumanInLoopHandler(raise_on_trigger=True)],
)

# Attach as LangChain callback
chain = LLMChain(llm=llm, callbacks=[adapter.as_callback()])

# Or call directly from a LangGraph node
def research_node(state):
    response = llm.invoke(state["messages"])
    try:
        adapter.on_llm_end(response.content, logprobs=response.response_metadata.get("logprobs"))
    except HumanInLoopHandler.HumanReviewRequired:
        return {"status": "escalated"}
    return {"response": response.content}

CrewAI Integration

from crewai import Agent
from confidence_escalation.adapters.crewai import CrewAIEscalationAdapter

adapter = CrewAIEscalationAdapter(threshold=0.65)

agent = Agent(
    role="Research Specialist",
    goal="Analyze market trends",
    backstory="...",
    step_callback=adapter.step_callback,
)

Google ADK Integration

from google.adk.agents import BaseAgent
from confidence_escalation.adapters.google_adk import ADKEscalationAdapter

class GovernedAgent(BaseAgent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._escalation = ADKEscalationAdapter(threshold=0.65)

    async def _run_async_impl(self, ctx):
        async for event in self._llm_agent._run_async_impl(ctx):
            if event.is_final_response():
                result = self._escalation.evaluate_event(event, ctx)
                if result["triggered"]:
                    yield self._escalation.build_escalation_event(result)
                    return
            yield event

Composite Policy Chains

from confidence_escalation import ThresholdPolicy, EscalationAction
from confidence_escalation.policy import CompositePolicy

policy = CompositePolicy(policies=[
    ThresholdPolicy(threshold=0.25, action=EscalationAction.ABORT),
    ThresholdPolicy(threshold=0.55, action=EscalationAction.HUMAN_IN_LOOP),
    ThresholdPolicy(threshold=0.75, action=EscalationAction.COMPLIANCE_LOG),
])

result = policy.evaluate(score, context={"session_id": "abc"})
# First matching threshold wins

OWASP Agentic AI Coverage

OWASP ASI ID Risk Coverage
ASI-09 Human-Agent Trust Exploitation Confidence gating before high-stakes actions
ASI-02 Tool Misuse Tool restriction handler removes high-risk tools at low confidence
ASI-03 Identity/Privilege Abuse ComplianceLoggingHandler creates immutable audit trail

Related Packages


License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

confidence_escalation-0.1.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

confidence_escalation-0.1.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file confidence_escalation-0.1.0.tar.gz.

File metadata

  • Download URL: confidence_escalation-0.1.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for confidence_escalation-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c4fc641f1a16b96ca9895525ed9b3321ee094fea3b2ee5221ab166066cf21fdf
MD5 0779802ba2fefac880f69c0aa8f7c70a
BLAKE2b-256 57ab54b8092a15ccc30256f6fc89a85135f53fcdfdf6d6f018f34a6e172bd84f

See more details on using hashes here.

File details

Details for the file confidence_escalation-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for confidence_escalation-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3155fcb1fc2856ae4d0538f1c3deed9da1bba229b37a245b39dea876420d10be
MD5 455889fa92985e0503f83c73d9b9301b
BLAKE2b-256 e5d98a6548321e8d67480b007f98ff72d2b856a68e51f5f131395b0216695efd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page