Skip to main content

Black-box AI reliability certification via self-consistency sampling and conformal calibration

Project description

Know if your AI is ready to ship — one number, one guarantee.

License PyPI arXiv Docs Follow @CohorteAI

TrustGate certifies the reliability of any AI endpoint — LLMs, agents, RAG pipelines, or any system you can ask a question to. It uses self-consistency sampling and conformal prediction to produce a single reliability level (e.g., 94.6%) backed by a formal statistical guarantee. Not a vibe, not a leaderboard score — a mathematical proof.

What's included:

  • Self-consistency sampling — ask the same question K times, measure agreement
  • Conformal calibration — formal coverage guarantee, distribution-free
  • Human calibration — shareable HTML questionnaire for domain experts (no server needed)
  • Runtime trust layer — wrap any endpoint with reliability metadata
  • Sequential stopping — Hoeffding bounds reduce API costs by ~50%
  • Profile diagnostics — automatic detection of canonicalization failures

[!NOTE] Part of the theaios ecosystem. Install with pip install theaios-trustgate.

Quickstart

pip install theaios-trustgate
from theaios import trustgate

result = trustgate.certify(config_path="trustgate.yaml")
print(result.reliability_level)  # 0.946

The pipeline: sample K responses → canonicalize → calibrate with conformal prediction → get a reliability level with a guarantee. Works with any provider (OpenAI, Anthropic, self-hosted), any task type, fully black-box.

[!TIP] For the full theory, see our paper: Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration (Mouzouni, 2026).

Three Ways to Use TrustGate

1. Deployment gate — certify before shipping

trustgate certify --yes
# Exit code 0 = PASS, 1 = FAIL
     TrustGate Certification Result
┌──────────────────────┬──────────┐
│ Reliability Level    │ 94.6%    │
│ M* (prediction set)  │ 1        │
│ Empirical Coverage   │ 0.956    │
│ Capability Gap       │ 2.4%     │
│ Status               │ PASS     │
└──────────────────────┴──────────┘

2. Runtime trust layer — confidence on every query

from theaios.trustgate import TrustGate, certify

result = certify(config_path="trustgate.yaml")
gate = TrustGate(config=config, certification=result)

# Passthrough (1 API call): attaches reliability metadata
response = gate.query("What is the treatment for X?")
response.reliability_level  # 0.946

# Sampled (K API calls): per-query prediction set
gate = TrustGate(config=config, certification=result, mode="sampled")
response = gate.query("What is the treatment for X?")
response.prediction_set  # ["Aspirin + PCI"]
response.consensus       # 0.8

3. Human calibration — no ground truth needed

Generate a questionnaire, share it with a domain expert, certify with their labels:

trustgate calibrate --export questionnaire.html
# Share via email/Slack → reviewer opens in browser → downloads labels.json
trustgate certify --ground-truth labels.json

Where Do the Questions Come From?

You don't need a gold-standard dataset to use TrustGate. Three ways to get started:

1. Generate questions with AI. Ask an LLM to produce realistic questions for your system:

"Generate 100 realistic customer support questions that users would ask
our e-commerce chatbot, covering orders, returns, shipping, and products."

2. Extract from production logs. Pull real queries from your observability stack (Datadog, Langfuse, LangSmith, custom logs). These are the actual questions your system faces — the most representative test set possible.

3. Use built-in benchmarks. For standard tasks, TrustGate ships dataset loaders:

from theaios.trustgate.datasets import load_gsm8k, load_mmlu
questions = load_mmlu(subjects=["abstract_algebra"], n=100)

No ground truth labels? No problem — use human calibration. A domain expert reviews 50 items in 10 minutes.

Full guide: Getting Your Questions

Works With Any Endpoint

LLMs, agents, RAG pipelines — anything with an HTTP API:

# LLM
endpoint:
  url: "https://api.openai.com/v1/chat/completions"
  model: "gpt-4.1"
  api_key_env: "OPENAI_API_KEY"

# Generic agent / RAG / custom API
endpoint:
  url: "https://my-agent.example.com/api/ask"
  temperature: null
  request_template:
    query: "{{question}}"
  response_path: "answer"
  cost_per_request: 0.03

Certify Each Component, Not Just the Final Output

Complex AI systems are pipelines — retriever, reranker, reasoning, generation. Don't certify the whole pipeline as a black box. Certify each component independently to find exactly where reliability breaks down.

Query → [Retriever] → [Reranker] → [Generator] → Answer
            ↑              ↑             ↑
      certify: 94%    certify: 91%   certify: 87%
      "right docs?"  "right order?" "right answer?"

Each component is just an endpoint — TrustGate certifies it independently with its own questions, canonicalization, and reliability level. This lets you:

  • Pinpoint failures: the generator is the weak link, not the retriever
  • Iterate efficiently: improve one component, re-certify just that one
  • Stay agnostic: document changes don't invalidate the retriever certification
Component Certify on Canonicalization
RAG retriever Retrieved document IDs Exact match
SQL agent Generated SQL query Normalized SQL
Classification step Category label MCQ
Reasoning step Intermediate conclusion LLM-judge or custom
Final answer Short structured output Numeric / MCQ

TrustGate warns you automatically when outputs are too long or diverse for meaningful self-consistency measurement.

Pre-flight Cost Estimate

Before spending money, TrustGate shows the cost/reliability tradeoff:

     Cost / Reliability Tradeoff
┌────┬───────────┬──────────┬────────────┐
│  K │ Est. Cost │ Max Cost │ Resolution │
│  3 │ $9.00     │ $18.00   │   coarse   │
│  5 │ $15.00    │ $30.00   │  moderate  │
│ 10←│ $30.00    │ $60.00   │    fine    │
│ 20 │ $60.00    │ $120.00  │    fine    │
└────┴───────────┴──────────┴────────────┘
Proceed? [Y/n]:

Documentation

Additional Resources

  • Examples — Working certification scripts
  • FAQ — Common questions
  • Paper — The research behind TrustGate

Why TrustGate?

  • Formal guarantee — conformal coverage bound, not a heuristic score
  • Black-box — no model internals, no logprobs, just API access
  • Any endpoint — LLMs, agents, RAG, custom APIs
  • Human-in-the-loop — shareable questionnaire, no server needed
  • Cost-aware — pre-flight estimates, sequential stopping saves ~50%
  • Production-ready — passthrough mode, CI/CD gating, periodic recalibration

Citation

@article{mouzouni2026trustgate,
  title={TrustGate: Black-Box AI Reliability Certification via
         Self-Consistency Sampling and Conformal Calibration},
  author={Mouzouni, Charafeddine},
  year={2026}
}

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theaios_trustgate-0.1.2.tar.gz (131.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

theaios_trustgate-0.1.2-py3-none-any.whl (62.7 kB view details)

Uploaded Python 3

File details

Details for the file theaios_trustgate-0.1.2.tar.gz.

File metadata

  • Download URL: theaios_trustgate-0.1.2.tar.gz
  • Upload date:
  • Size: 131.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for theaios_trustgate-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e95f455326708ef302d40878a2f8ddee0f507b99a6be87448a032b083c65d97f
MD5 e34da33e8697d87e8d24840fd5be3c2c
BLAKE2b-256 aa8399e26181445e600c0bd4e2916b9eb37a34f986e0f90598acedafd8498e7b

See more details on using hashes here.

File details

Details for the file theaios_trustgate-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for theaios_trustgate-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6f99f0be10fad1165de52da687307da955a2a749e92332c954244a48a18299a0
MD5 0e4ebb8100e9249f2d71bb3eca06977c
BLAKE2b-256 7471b2ed092095a1333cdc6db8ae81eed44e58c7fd473058c41647ed3b0e0878

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page