Skip to main content

Multi-Factor Generative-Deterministic Confidence (MFGC) scoring and safety gates for AI agents. Zero dependencies.

Project description

murphy-confidence

Should your AI agent act? murphy-confidence answers that question with math, not vibes.

PyPI version Python versions License CI Downloads GitHub Sponsors

Zero dependencies · Pure Python 3.10+ · pip install murphy-confidence


The problem

Every AI agent framework gives you a way to call tools. None of them give you a principled way to decide whether to call them.

You end up with one of:

  • A hardcoded thresholdif confidence > 0.7: execute() — no phase awareness, no hazard weighting, no audit trail
  • A vibe check — asking the LLM "are you sure?" and hoping it says no when it should
  • Nothing — just letting the agent do whatever it calculates and hoping for the best

When you're automating actions that touch real data, real money, or real people, none of those options are acceptable.


The solution

murphy-confidence implements the Multi-Factor Generative-Deterministic Confidence (MFGC) formula:

C(t) = w_g · G(x) + w_d · D(x) − κ · H(x)

Where:

Symbol Meaning Range
G(x) Generative quality score — how good is the LLM output? [0, 1]
D(x) Domain-deterministic score — does this match the rules? [0, 1]
H(x) Hazard factor — how bad if this is wrong? [0, 1]
w_g, w_d, κ Phase-locked weights — shift toward determinism as execution approaches

The weights are phase-locked: as your pipeline moves from brainstorming to executing, the formula automatically shifts trust away from the LLM and toward your domain rules. At EXECUTE phase, the threshold is 0.85. At EXPAND phase, it's 0.50.


5-second quickstart

pip install murphy-confidence
from murphy_confidence import compute_confidence
from murphy_confidence.types import Phase

result = compute_confidence(
    goodness=0.82,   # How good is the AI output?  [0-1]
    domain=0.75,     # How well does it match domain rules?  [0-1]
    hazard=0.10,     # How risky is this action?  [0-1]
    phase=Phase.EXECUTE,
)

print(result.score)    # 0.7585
print(result.action)   # GateAction.PROCEED_WITH_MONITORING
print(result.allowed)  # True
print(result.rationale)
# [ALLOWED] Phase=EXECUTE | C=0.7585 (threshold=0.85) | Action=PROCEED_WITH_MONITORING | ...

Complete feature walkthrough

The Confidence Engine

The engine is stateless. Call it anywhere, in any thread, with any inputs:

from murphy_confidence import ConfidenceEngine
from murphy_confidence.types import Phase

engine = ConfidenceEngine()

# Low hazard, high quality — proceeds automatically at EXECUTE
result = engine.compute(goodness=0.95, domain=0.90, hazard=0.02, phase=Phase.EXECUTE)
assert result.action.value == "PROCEED_AUTOMATICALLY"

# High hazard — blocked even with good quality
result = engine.compute(goodness=0.90, domain=0.85, hazard=0.80, phase=Phase.EXECUTE)
assert not result.allowed

The phase-locked weight schedule means the same inputs produce different outcomes at different phases — early phases are lenient, EXECUTE is strict:

Phase Score (goodness=0.78, domain=0.72, hazard=0.15) Allowed
EXPAND 0.6570
TYPE 0.6410
ENUMERATE 0.6250
CONSTRAIN 0.6045
COLLAPSE 0.5885
BIND 0.5745
EXECUTE 0.5555

Safety Gates

Gates wrap a confidence result in a domain-specific policy check:

from murphy_confidence import SafetyGate
from murphy_confidence.types import GateType

# A compliance gate at 0.90 — blocking by default
gate = SafetyGate("hipaa_compliance", GateType.COMPLIANCE)

result = compute_confidence(0.82, 0.78, 0.08, Phase.EXECUTE)
gr = gate.evaluate(result)

if not gr.passed and gr.blocking:
    raise RuntimeError(gr.message)
    # Gate 'hipaa_compliance' (COMPLIANCE) FAILED [BLOCKING] — confidence 0.7368 < threshold 0.9000

Six gate types, each with sensible defaults:

Gate Type Default Threshold Blocking
EXECUTIVE 0.85
OPERATIONS 0.70
QA 0.75
HITL 0.80
COMPLIANCE 0.90
BUDGET 0.65

Gate Compiler

Don't know which gates you need? The compiler figures it out:

from murphy_confidence import GateCompiler, compute_confidence
from murphy_confidence.types import Phase

result = compute_confidence(0.72, 0.68, 0.18, Phase.EXECUTE)
compiler = GateCompiler()
gates = compiler.compile_gates(result, context={"compliance_required": True})

for gate in gates:
    gr = gate.evaluate(result)
    print(f"{gr.gate_id}: {'PASS' if gr.passed else 'FAIL'}")

The compiler uses a rule table that maps (phase, action) pairs to gate sets — so the right gates are automatically included for EXECUTE phase, for blocking actions, for compliance contexts, etc.

Domain Models

For vertical-specific scoring, the domain sub-package provides ready-made scorers for healthcare, financial, and manufacturing scenarios:

from murphy_confidence.domain.healthcare import HealthcareDomainEngine
from murphy_confidence import compute_confidence
from murphy_confidence.types import Phase

engine = HealthcareDomainEngine()
g, d, h = engine.compute(patient_record, prescription)

result = compute_confidence(g, d, h, Phase.EXECUTE)

Integration examples

FastAPI middleware

Gate every AI agent action before it hits your handler:

from fastapi import FastAPI, Request
from murphy_confidence import GateCompiler, compute_confidence
from murphy_confidence.types import Phase

app = FastAPI()
compiler = GateCompiler()

@app.middleware("http")
async def confidence_gate(request: Request, call_next):
    if request.url.path == "/agent/action":
        body = await request.json()
        result = compute_confidence(
            body["goodness"], body["domain"], body["hazard"], Phase.EXECUTE
        )
        gates = compiler.compile_gates(result, context={"compliance_required": True})
        for gate in gates:
            gr = gate.evaluate(result)
            if not gr.passed and gr.blocking:
                return JSONResponse({"blocked": True, "reason": gr.message}, status_code=403)
    return await call_next(request)

See examples/fastapi_middleware.py for the full runnable example.

LangChain callback

Intercept every tool call and gate it:

from murphy_confidence import GateCompiler, compute_confidence
from murphy_confidence.types import Phase

class MurphyConfidenceCallback:
    def on_tool_start(self, serialized, input_str, **kwargs):
        result = compute_confidence(
            kwargs.get("goodness", 0.70),
            kwargs.get("domain", 0.65),
            kwargs.get("hazard", 0.15),
            Phase.EXECUTE,
        )
        gates = GateCompiler().compile_gates(result)
        for gate in gates:
            gr = gate.evaluate(result)
            if not gr.passed and gr.blocking:
                raise RuntimeError(f"Tool blocked: {gr.message}")

See examples/langchain_callback.py for the full runnable example (no LangChain install required for the demo).

Raw Python

from murphy_confidence import compute_confidence, SafetyGate
from murphy_confidence.types import GateType, Phase

# Score the action
result = compute_confidence(
    goodness=0.88,
    domain=0.82,
    hazard=0.05,
    phase=Phase.EXECUTE,
)

# Create a domain-specific gate
gate = SafetyGate("production_deploy", GateType.EXECUTIVE, blocking=True)
gr = gate.evaluate(result)

if gr.passed:
    deploy_to_production()
else:
    notify_human(gr.message)

Why not just use a threshold?

A simple if confidence > 0.7: proceed has four failure modes that murphy-confidence fixes:

Problem Simple threshold murphy-confidence
Same threshold at brainstorm and execute ✗ both same ✓ 0.50 → 0.85 ramp
No hazard awareness ✗ ignored ✓ κ · H(x) penalty
No domain validation ✗ only LLM score ✓ w_d · D(x) component
No audit trail ✗ silent pass/fail ✓ rationale string on every result
No gate composition ✗ one boolean ✓ gate pipeline with blocking semantics
No serialisation ✗ raw float as_dict() on all results

Part of Murphy System

murphy-confidence was extracted from Murphy System, an autonomous AI orchestration platform. Inside Murphy, every agent decision — from executing a campaign to deploying code — passes through this confidence gate before it's allowed to act.

We extracted it because the gating problem is universal: if you're building any AI agent that takes real-world actions, you need this layer. A confidence gate stops your agent from acting when it shouldn't and lets it act when it can — with an auditable score behind every decision.

⚠️ Murphy System is currently beta software. We're being honest about that so you can set expectations accordingly.

If you find this library useful, check out the full system at github.com/IKNOWINOT/Murphy-System.


Pipeline phases

Phase Description Threshold
EXPAND Brainstorming, ideation 0.50
TYPE Classifying and labelling 0.55
ENUMERATE Listing options 0.60
CONSTRAIN Applying rules and limits 0.65
COLLAPSE Selecting the best option 0.70
BIND Binding to specific resources 0.78
EXECUTE Taking real-world action 0.85

Action classification

Action Score range Meaning
PROCEED_AUTOMATICALLY ≥ 0.90 Full autonomy
PROCEED_WITH_MONITORING ≥ 0.80 Execute + log
PROCEED_WITH_CAUTION ≥ 0.70 Execute with extra checks
REQUEST_HUMAN_REVIEW ≥ 0.55 Flag for human, don't block
REQUIRE_HUMAN_APPROVAL ≥ 0.40 Block until approved
BLOCK_EXECUTION < 0.40 Hard stop

Community


License

Apache License 2.0 — see LICENSE.

Copyright © 2020-2026 Inoni Limited Liability Company (Corey Post)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

murphy_confidence-0.1.0.tar.gz (46.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

murphy_confidence-0.1.0-py3-none-any.whl (45.3 kB view details)

Uploaded Python 3

File details

Details for the file murphy_confidence-0.1.0.tar.gz.

File metadata

  • Download URL: murphy_confidence-0.1.0.tar.gz
  • Upload date:
  • Size: 46.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for murphy_confidence-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eeffef25b7bfbf8fd79948e4529bd2fea60d4c7b7c5f70f3d64ebc991b57d9cc
MD5 637edeacb32e6201f414ed72d024b153
BLAKE2b-256 0c42dec7350a51e70ce15e8036f405af12a501414733383dd216272caa19cf61

See more details on using hashes here.

Provenance

The following attestation bundles were made for murphy_confidence-0.1.0.tar.gz:

Publisher: publish.yml on IKNOWINOT/murphy-confidence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file murphy_confidence-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for murphy_confidence-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 023aff2d97aac0a2f19826abe86780e73494f2d34b7061ab1ec16cfa35614d03
MD5 8ea6e63e8cab2070ee8d91ed036abf75
BLAKE2b-256 f26afce065aae5e5a14cc349453522bf4ad22630995178b04fd4048d2320e9d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for murphy_confidence-0.1.0-py3-none-any.whl:

Publisher: publish.yml on IKNOWINOT/murphy-confidence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page