Skip to main content

Ground Truth Verification System — federated fact verifier for LLM agents

Project description

Ground Truth Verification System (gtv)

A federated, open-source fact verifier that AI agents call as an MCP tool. Every response is evidence-backed, signed, and anchored in public transparency logs.

See docs/ARCHITECTURE.md for system design, threat model, and implementation details.

Status

Phase 1 complete — v0.1.0 pilot release.

  • 757 tests passing, 20 skipped (live/opt-in), 0 failing.
  • 73 source files, ruff + mypy-strict clean.
  • 8 evidence adapters: arXiv, INSPIRE-HEP, Crossref, OpenAlex, Semantic Scholar, PubMed, NIST, Retraction Watch.
  • 9 CLIs shipped (see below). Distroless Docker image published to ghcr.io/groundtruth/gtv, multi-arch, Cosign-signed.
  • DID resolution: did:web + did:key (Ed25519, P-256, secp256k1).
  • Federation: persistent SQLite trust registry with signed update log; tier-weighted consensus across independent verifiers.
  • Revocation: signed revocation lists, 410 Gone enforcement.
  • Audit: hash-chained, Ed25519-signed append-only log of admin actions.

See CHANGELOG.md for the full v0.1.0 changelog.

What It Does

Claim: "The Higgs boson has a mass of 125 GeV."
       ↓
verdict: VERIFIED
confidence: 0.87
evidence: [3 peer-reviewed sources from Crossref + arXiv + INSPIRE-HEP]
signature: ECDSA-P256 (Sigstore keyless)
rekor_uuid: 24c05a...  (transparency log entry)
tsa_token: [RFC 3161 timestamp, primary + secondary]
prov_graph: [PROV-O JSON-LD trace: sources → evidence → verdict]

Every response is a sealed Envelope: signed over a canonical JSON form (RFC 8785), inclusion-proved in Rekor, and timestamped by two redundant RFC 3161 TSAs.

Install

Requires Python 3.12+.

# Once v0.1.0 lands on PyPI:
pip install gtv

# Development install (editable + dev extras):
pip install -e '.[dev]'
# or: uv sync

Or run the prebuilt distroless image:

docker pull ghcr.io/groundtruth/gtv:v0.1.0
docker run --rm -p 8080:8080 ghcr.io/groundtruth/gtv:v0.1.0

See pyproject.toml for full dependency list and CHANGELOG.md for release notes.

Try it in 30 seconds

No server setup, no OIDC, no config:

pip install -e .
gtv-demo "The Higgs boson mass is 125 GeV"

Output:

Subclaim 1:
  The Higgs boson mass is 125 GeV

  Verdict: VERIFIED (87%)

  Evidence:
    - did:web:arxiv.org
      Stance: SUPPORTS
      URL: https://arxiv.org/abs/hep-ex/0302009
      Discovery of a Higgs boson candidate by the ATLAS collaboration...

    - did:web:inspire-hep
      Stance: SUPPORTS
      URL: https://inspirehep.net/record/1234567
      The Higgs boson: precision measurements from LHC...

For offline testing (no network):

gtv-demo --offline "The speed of light is 299792458 m/s"
gtv-demo --json "Einstein's E=mc²"  # Machine-readable output

Deploy a pilot (15 minutes)

For a commercial pilot, deploy gtv as a containerized service. No manual infrastructure setup required.

Prerequisites: Docker + Docker Compose (v2+).

git clone <repo>
cd gtv
cp .env.example .env
docker compose up -d
curl http://localhost:8080/health

That's it. The service is ready on port 8080. HTTP API, Prometheus metrics, and healthcheck are all live.

For federation setup, issuer DID, Sigstore keyless signing, RFC 3161 timestamping, and observability details, see docs/pilot-onboarding.md.

Quick Start

HTTP API (FastAPI)

uvicorn gtv.mcp.server:app --reload --port 8080

POST to /tools/verify_claim:

curl -X POST http://localhost:8080/tools/verify_claim \
  -H 'Content-Type: application/json' \
  -d '{
    "claim": "E=mc^2",
    "domain": "physics",
    "max_sources": 5
  }' | jq .envelope

Response shape:

{
  "envelope": {
    "verdict": "VERIFIED",
    "confidence": 0.87,
    "evidence": [
      {
        "source_did": "did:web:arxiv.org",
        "excerpt": "...",
        "stance": "SUPPORTS",
        "url": "https://...",
        "retrieved_at": "2026-04-21T..."
      }
    ],
    "signature": "<base64>",
    "rekor_uuid": "...",
    "rekor_log_index": 12345,
    "tsa_token": "<base64>",
    "anchor_proof": {
      "rekor_inclusion_proof": "{\"log_index\": 12345, ...}",
      "tsa_tsr_primary": "<base64>",
      "tsa_tsr_secondary": "<base64>"
    },
    "prov_graph": "<PROV-O JSON-LD>"
  }
}

MCP over stdio

# List available tools
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | gtv-mcp

# Call verify_claim
echo '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"verify_claim","arguments":{"claim":"Is water H2O?"}}}' | gtv-mcp

The gtv-mcp binary wraps the HTTP server logic over the MCP JSON-RPC protocol (via the official mcp SDK).

CLI Verifier (Offline)

Save an envelope to envelope.json, then verify offline:

# Verify with all checks (except --allow-stub / --allow-unrooted flags)
gtv-verify envelope.json

# Allow stub signatures (development/testing)
gtv-verify --allow-stub envelope.json

# Trust the issuer without resolving its DID
gtv-verify --trust-issuer envelope.json

# Skip Rekor log-root trust check
gtv-verify --allow-unrooted envelope.json

Exit codes:

Code Meaning
0 Envelope fully verified
2 Stub signature/proof detected (not cryptographically verifiable)
3 Signature verification failed
4 Rekor inclusion proof mismatch
5 Schema/JSON parse error
6 Rekor log-root key not trusted
7 Issuer identity verification failed
64 Usage error (e.g., missing envelope path)

Verdict Classes

  • VERIFIED: Evidence from one or more sources supports the claim.
  • UNVERIFIED: No evidence found supporting or refuting the claim.
  • CONTESTED: Evidence both supports and refutes; magnitude differs.
  • OPINION: Claim is subjective/normative; no factual verification performed.
  • CREATIVE: Claim requests fictional or imaginative content.
  • OUT_OF_SCOPE: Domain or content type outside this system's scope.

Registered Adapters

All adapters are federated sources (we do not fabricate evidence). Each is assigned a trust tier.

Adapter Source Type Tier
arXiv arXiv preprints PREPRINT TIER_2
INSPIRE-HEP High-energy physics papers PEER_REVIEWED TIER_1
Crossref Peer-reviewed metadata PEER_REVIEWED TIER_1
OpenAlex Multidisciplinary research PEER_REVIEWED TIER_2
Semantic Scholar Computer science / NLP PEER_REVIEWED TIER_2

Adapters fail loudly: on timeout or network error, they return an empty list (not fabricated results). The verdict engine handles missing evidence gracefully.

Trust Model

An envelope is trustworthy if:

  1. Hash integrity: The canonical JSON hash (RFC 8785) matches the signed-over payload.
  2. Signature: ECDSA-P256 signature verifies against the canonical hash using the issuer's Sigstore certificate.
  3. Transparency: The signature is included in the public Rekor transparency log (RFC 6962 merkle tree proof).
  4. Time-stamping: At least one RFC 3161 TSR (primary or secondary) verifies over the canonical hash.
  5. Issuer identity: The certificate's identity (email/OIDC issuer) matches the issuer DID's DID document (federation registry lookup optional).

Evidence inside the envelope is not signed individually; only the summary and verdicts are signed. Verifiers must audit the evidence chain themselves.

Federation Registry

When offline-verifying an envelope, the CLI checks that the issuer's DID is in the trusted registry. The registry is a JSON file:

[
  {
    "did": "did:web:verify.groundtruth.example",
    "trust_tier": "TIER_1",
    "display_name": "Ground Truth Pilot",
    "expected_fulcio_oidc": "https://token.actions.githubusercontent.com",
    "expected_issuer_email": "github-actions@github.com",
    "notes": "GitHub Actions–based verifier."
  }
]

Set GTV_FEDERATION_REGISTRY=/path/to/registry.json to load on startup. If unset, falls back to a minimal default registry.

Configuration

Env Var Default Purpose
GTV_ISSUER_DID did:web:verify.groundtruth.example This issuer's DID (used in signatures)
GTV_SIGSTORE_ENABLED (unset) Enable real Sigstore keyless signing; otherwise emit stub signatures
GTV_TSA_ENABLED (unset) Enable real RFC 3161 timestamping; otherwise stub tokens
GTV_REKOR_URL https://rekor.sigstore.dev Rekor server URL (prod)
GTV_TSA_PRIMARY_URL https://freetsa.org/tsr Primary TSA
GTV_TSA_SECONDARY_URL http://timestamp.digicert.com Secondary TSA (best-effort)
GTV_FEDERATION_REGISTRY (unset) Path to issuer registry JSON
GTV_DISABLE_DEFAULT_ADAPTERS (unset) If set, skip loading built-in adapters

Observability

All request paths record Prometheus metrics:

  • gtv_verdict_total (counter): verdicts by class and confidence band.
  • gtv_adapter_call_duration_seconds (histogram): per-adapter latency.
  • gtv_adapter_call_status_total (counter): per-adapter success/timeout/error counts.

OpenTelemetry tracing is wired via the opentelemetry-instrumentation-fastapi package. Set OTEL_EXPORTER_OTLP_ENDPOINT to export.

Testing

pytest -q                    # Default (stubs only)
GTV_LIVE=1 pytest            # Hit live adapters (network required)
GTV_SIGSTORE_ENABLED=1 pytest # Real Sigstore signing
GTV_TSA_ENABLED=1 pytest     # Real RFC 3161 timestamps
GTV_E2E_LIVE=1 pytest        # Full end-to-end with all services

CI runs all tiers nightly (see .github/workflows/).

Design Principles

  1. Federate, don't invent. We aggregate from trusted public sources; we never fabricate evidence.
  2. Standards only. W3C VC 2.0, DIDs 1.0, PROV-O, Sigstore, RFC 3161, MCP. No proprietary formats.
  3. No blockchain. No cryptocurrency. No token. No consensus layer.
  4. Evidence first. If sources conflict, mark CONTESTED, not VERIFIED. Never suppress evidence.
  5. Fail loudly. A missing Rekor key, a timeout TSA, a DID that won't resolve — all logged, not swallowed.

Layout

src/gtv/
  mcp/         — FastAPI + MCP stdio server
  adapters/    — five federated source modules
  verdict/     — deterministic rule-based engine
  anchor/      — canonical JSON, Sigstore, Rekor, RFC 3161
  federation/  — issuer registry + DID resolution
  models.py    — Pydantic schemas (Claim, Evidence, Envelope, ...)
  obs/         — Prometheus + OpenTelemetry
cli/
  verify.py    — offline envelope verifier (< 300 LOC)
.github/workflows/
  ci.yml       — every PR: ruff, mypy, pytest
  nightly.yml  — all test tiers (adapters, Sigstore, TSA, E2E)
  docker.yml   — build, scan, sign image (on tag)
spec/schemas/  — JSON Schema for Envelope (CC0)
tests/         — pytest suite with fixtures
docs/          — architecture, ADRs, build spec

License

Apache License 2.0. No CLA. All work is federated and open-source.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtv-2.3.4.tar.gz (658.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gtv-2.3.4-py3-none-any.whl (337.6 kB view details)

Uploaded Python 3

File details

Details for the file gtv-2.3.4.tar.gz.

File metadata

  • Download URL: gtv-2.3.4.tar.gz
  • Upload date:
  • Size: 658.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gtv-2.3.4.tar.gz
Algorithm Hash digest
SHA256 e8b2938713d22d328d55754ab292bba7705c431467f99940846c5e5f6da4c44a
MD5 10e1b9a9d1f4d8eede7d72c797f6874e
BLAKE2b-256 3dbd9ddbd8d48edf15288f0e5d4338b7a8b1a75b4fd4374f0bb0a30efa8669aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for gtv-2.3.4.tar.gz:

Publisher: release.yml on mark-hallam/gtv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gtv-2.3.4-py3-none-any.whl.

File metadata

  • Download URL: gtv-2.3.4-py3-none-any.whl
  • Upload date:
  • Size: 337.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gtv-2.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 aeb000ca0943846fded9e8e8443bf62379fb54b71010ea9bf7dd1469b17d704a
MD5 f5175c70d2e59188a0498467d336d4eb
BLAKE2b-256 6341abb708a68d7191bb930facd4bd367e8627e65d005550d18b54dd8a588067

See more details on using hashes here.

Provenance

The following attestation bundles were made for gtv-2.3.4-py3-none-any.whl:

Publisher: release.yml on mark-hallam/gtv

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page