Skip to main content

Black-box AI reliability certification via self-consistency sampling and conformal calibration

Project description

Know if your AI is ready to ship — one number, one guarantee.

License PyPI arXiv Docs Follow @CohorteAI

TrustGate certifies the reliability of any AI endpoint — LLMs, agents, RAG pipelines, or any system you can ask a question to. It uses self-consistency sampling and conformal prediction to produce a single reliability level (e.g., 94.6%) backed by a formal statistical guarantee. Not a vibe, not a leaderboard score — a mathematical proof.

What's included:

  • Self-consistency sampling — ask the same question K times, measure agreement
  • Conformal calibration — formal coverage guarantee, distribution-free
  • Human calibration — shareable HTML questionnaire for domain experts (no server needed)
  • Runtime trust layer — wrap any endpoint with reliability metadata
  • Sequential stopping — Hoeffding bounds reduce API costs by ~50%
  • Profile diagnostics — automatic detection of canonicalization failures

[!NOTE] Part of the theaios ecosystem. Install with pip install theaios-trustgate.

Quickstart

pip install theaios-trustgate
from theaios import trustgate

result = trustgate.certify(config_path="trustgate.yaml")
print(result.reliability_level)  # 0.946

The pipeline: sample K responses → canonicalize → calibrate with conformal prediction → get a reliability level with a guarantee. Works with any provider (OpenAI, Anthropic, self-hosted), any task type, fully black-box.

[!TIP] For the full theory, see our paper: Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration (Mouzouni, 2026).

Three Ways to Use TrustGate

1. Deployment gate — certify before shipping

trustgate certify --yes
# Exit code 0 = PASS, 1 = FAIL
     TrustGate Certification Result
┌──────────────────────┬──────────┐
│ Reliability Level    │ 94.6%    │
│ M* (prediction set)  │ 1        │
│ Empirical Coverage   │ 0.956    │
│ Capability Gap       │ 2.4%     │
│ Status               │ PASS     │
└──────────────────────┴──────────┘

2. Runtime trust layer — confidence on every query

from theaios.trustgate import TrustGate, certify

result = certify(config_path="trustgate.yaml")
gate = TrustGate(config=config, certification=result)

# Passthrough (1 API call): attaches reliability metadata
response = gate.query("What is the treatment for X?")
response.reliability_level  # 0.946

# Sampled (K API calls): per-query prediction set
gate = TrustGate(config=config, certification=result, mode="sampled")
response = gate.query("What is the treatment for X?")
response.prediction_set  # ["Aspirin + PCI"]
response.consensus       # 0.8

3. Human calibration — no ground truth needed

Generate a questionnaire, share it with a domain expert, certify with their labels:

trustgate calibrate --export questionnaire.html
# Share via email/Slack → reviewer opens in browser → downloads labels.json
trustgate certify --ground-truth labels.json

Where Do the Questions Come From?

You don't need a gold-standard dataset to use TrustGate. Three ways to get started:

1. Generate questions with AI. Ask an LLM to produce realistic questions for your system:

"Generate 100 realistic customer support questions that users would ask
our e-commerce chatbot, covering orders, returns, shipping, and products."

2. Extract from production logs. Pull real queries from your observability stack (Datadog, Langfuse, LangSmith, custom logs). These are the actual questions your system faces — the most representative test set possible.

3. Use built-in benchmarks. For standard tasks, TrustGate ships dataset loaders:

from theaios.trustgate.datasets import load_gsm8k, load_mmlu
questions = load_mmlu(subjects=["abstract_algebra"], n=100)

No ground truth labels? No problem — use human calibration. A domain expert reviews 50 items in 10 minutes.

Full guide: Getting Your Questions

Works With Any Endpoint

LLMs, agents, RAG pipelines — anything with an HTTP API:

# LLM
endpoint:
  url: "https://api.openai.com/v1/chat/completions"
  model: "gpt-4.1"
  api_key_env: "OPENAI_API_KEY"

# Generic agent / RAG / custom API
endpoint:
  url: "https://my-agent.example.com/api/ask"
  temperature: null
  request_template:
    query: "{{question}}"
  response_path: "answer"
  cost_per_request: 0.03

Certify Each Component, Not Just the Final Output

Complex AI systems are pipelines — retriever, reranker, reasoning, generation. Don't certify the whole pipeline as a black box. Certify each component independently to find exactly where reliability breaks down.

Query → [Retriever] → [Reranker] → [Generator] → Answer
            ↑              ↑             ↑
      certify: 94%    certify: 91%   certify: 87%
      "right docs?"  "right order?" "right answer?"

Each component is just an endpoint — TrustGate certifies it independently with its own questions, canonicalization, and reliability level. This lets you:

  • Pinpoint failures: the generator is the weak link, not the retriever
  • Iterate efficiently: improve one component, re-certify just that one
  • Stay agnostic: document changes don't invalidate the retriever certification
Component Certify on Canonicalization
RAG retriever Retrieved document IDs Exact match
SQL agent Generated SQL query Normalized SQL
Classification step Category label MCQ
Reasoning step Intermediate conclusion LLM-judge or custom
Final answer Short structured output Numeric / MCQ

TrustGate warns you automatically when outputs are too long or diverse for meaningful self-consistency measurement.

Pre-flight Cost Estimate

Before spending money, TrustGate shows the cost/reliability tradeoff:

     Cost / Reliability Tradeoff
┌────┬───────────┬──────────┬────────────┐
│  K │ Est. Cost │ Max Cost │ Resolution │
│  3 │ $9.00     │ $18.00   │   coarse   │
│  5 │ $15.00    │ $30.00   │  moderate  │
│ 10←│ $30.00    │ $60.00   │    fine    │
│ 20 │ $60.00    │ $120.00  │    fine    │
└────┴───────────┴──────────┴────────────┘
Proceed? [Y/n]:

Documentation

Additional Resources

  • Examples — Working certification scripts
  • FAQ — Common questions
  • Paper — The research behind TrustGate

Why TrustGate?

  • Formal guarantee — conformal coverage bound, not a heuristic score
  • Black-box — no model internals, no logprobs, just API access
  • Any endpoint — LLMs, agents, RAG, custom APIs
  • Human-in-the-loop — shareable questionnaire, no server needed
  • Cost-aware — pre-flight estimates, sequential stopping saves ~50%
  • Production-ready — passthrough mode, CI/CD gating, periodic recalibration

Citation

@article{mouzouni2026trustgate,
  title={TrustGate: Black-Box AI Reliability Certification via
         Self-Consistency Sampling and Conformal Calibration},
  author={Mouzouni, Charafeddine},
  year={2026}
}

License

Apache 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theaios_trustgate-0.1.1.tar.gz (131.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

theaios_trustgate-0.1.1-py3-none-any.whl (62.6 kB view details)

Uploaded Python 3

File details

Details for the file theaios_trustgate-0.1.1.tar.gz.

File metadata

  • Download URL: theaios_trustgate-0.1.1.tar.gz
  • Upload date:
  • Size: 131.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for theaios_trustgate-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9c6a8c6c16b04bad1122c6427390d974358541bbb954c98813be8923c0b996e8
MD5 6ce0e6a237a2b64d425bbde4869c5c65
BLAKE2b-256 49195299507c6bd078499c4834bbf79a47f2ed89de22538eaf0f8d55a624ea28

See more details on using hashes here.

File details

Details for the file theaios_trustgate-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for theaios_trustgate-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 338f2fb44ee6b9f75548f3711aacce2dc1c997cecc5302a83552da97d1aa1566
MD5 811b2e7197b3d149034730332ee31611
BLAKE2b-256 34c55be7370a3854f92e25a08ece20c1209c80bcc056f24811cbc47ac40b9d8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page