Skip to main content

Trust-weighted hallucination detection for AI agents. Verify LLM outputs against multiple sources with contradiction awareness. Zero dependencies. Sub-2ms.

Project description

GroundCheck

Trust-weighted hallucination detection for AI agents. Zero dependencies. Sub-2ms.

PyPI version CI Python 3.9+ License: MIT Zero Dependencies


The Problem

Your AI agent says "you work at Amazon." Memory says "Microsoft." Most systems won't catch this — they just return the most similar embedding and hope for the best. GroundCheck catches it in <2ms with zero dependencies.

Install

pip install groundcheck

10-Second Demo

from groundcheck import GroundCheck, Memory

verifier = GroundCheck()

memories = [
    Memory(id="m1", text="User works at Microsoft", trust=0.9),
    Memory(id="m2", text="User lives in Seattle", trust=0.8),
]

result = verifier.verify("You work at Amazon and live in Seattle", memories)

print(result.passed)          # False
print(result.hallucinations)  # ["Amazon"]
print(result.corrected)       # "You work at Microsoft and live in Seattle"
print(result.confidence)      # 0.65

CLI

# Verify text against a memory file
groundcheck verify "You work at Amazon" --memories memories.json

# Extract facts from text
groundcheck extract "My name is Alice and I work at Google"

# Show version
groundcheck version

Memory files are JSON — either a list of objects or {"memories": [...]}:

[
  {"text": "User works at Microsoft", "trust": 0.9},
  {"text": "User lives in Seattle", "trust": 0.8}
]

What Makes This Different

Other systems GroundCheck
Sources Single string or premise/hypothesis pair Multiple memories with per-source trust scores
Trust All sources treated equally Trust-weighted — high-trust memories override low-trust
Contradictions Not detected Cross-memory conflict detection with resolution
Correction Flag only — no fix Auto-rewrites hallucinations with grounded facts
Temporal No awareness most_recent vs most_trusted resolution
Dependencies Often torch, transformers, etc. Zero (stdlib only, neural optional)
Latency 500ms – 3,000ms+ Sub-2ms (regex mode)
Extra LLM calls Some require 3-5 per check Zero

How It Works

Generated text + Retrieved memories (with trust scores)
    → Tier 1: Regex fact extraction (15+ named slots)
    → Tier 1.5: Knowledge-based inference (verb ontology + entity taxonomy)
    → Tier 2: Neural paraphrase matching (optional)
    → Detect contradictions across memories (dynamic slot tracking)
    → Build grounding map (fuzzy match claims to memories)
    → Check disclosure requirements (trust-weighted)
    → Calculate confidence score
    → Generate corrections (strict mode)
    → VerificationReport

Trust-Weighted Verification

GroundCheck doesn't treat all sources equally. Each memory has a trust score:

memories = [
    Memory(id="m1", text="User is named Alice", trust=0.9),   # High trust
    Memory(id="m2", text="User is named Bob", trust=0.3),     # Low trust
]

result = verifier.verify("Your name is Bob", memories)
print(result.requires_disclosure)  # True — trust gap > 0.3
print(result.contradiction_details[0].most_trusted_value)  # "alice"
print(result.contradiction_details[0].most_recent_value)   # depends on timestamps

Verification Modes

  • strict — generates corrected text, replaces hallucinations with grounded facts
  • permissive — detects and reports, doesn't rewrite
result = verifier.verify("You live in Paris", memories, mode="strict")
print(result.corrected)  # Rewritten with grounded facts

result = verifier.verify("You live in Paris", memories, mode="permissive")
print(result.corrected)  # None — permissive doesn't rewrite

Three-Tier Extraction

Tier 1 — Regex (always active)

15+ named slots (name, employer, location, age, etc.) plus 9 universal pattern families:

Pattern Example
Copular (X is Y) "The server is running Ubuntu 22.04"
Non-copular verbs "Tesla manufactures electric vehicles"
Clause splitting "Bob is 30, lives in NYC, and works at Google"
Decisions & plans "We chose Postgres" / "They decided to use Rust"
Requirements "The app requires Node 18+"
Prescriptive "Always use HTTPS for API calls"

Tier 1.5 — Knowledge Inference (always active)

Understands conversational language that regex misses:

from groundcheck import extract_knowledge_facts

facts = extract_knowledge_facts("Yeah we ended up going with Postgres after the whole MySQL disaster")
# → database: postgres (adoption), mysql (deprecation)

Powered by:

  • Verb ontology — 10 semantic categories (adoption, migration, deprecation, tentative…) with ~200 verb phrases
  • Entity taxonomy — 22 tech categories with ~500 known entities
  • Inference rules — clause decomposition → entity recognition → verb semantics → fact extraction

Combined benchmark (42 sentences, 65 slots): F1 = 83.2% (+44% over regex alone).

Tier 2 — Neural (optional)

pip install groundcheck[neural]
verifier = GroundCheck(neural=True)   # Enable paraphrase matching

# Catches paraphrases regex can't:
memories = [Memory(id="m1", text="User works at Google")]
result = verifier.verify("Employed by Google", memories)  # ✓ passes

Models are loaded lazily on first use — no startup cost until you need them. Five matching strategies: exact → normalization → fuzzy → synonym → embedding. NLI-based contradiction refinement filters false positives.

API Reference

GroundCheck

  • GroundCheck(neural=False) — constructor. neural=False (default) for zero-dependency sub-2ms mode. neural=True enables semantic matching (requires groundcheck[neural]).
  • verify(generated_text, retrieved_memories, mode="strict")VerificationReport
  • extract_claims(text)Dict[str, ExtractedFact]
  • find_support(claim, memories) → match info

extract_fact_slots(text) (standalone function)

Universal regex fact extractor — works on any domain text. Returns Dict[str, ExtractedFact] with dynamically discovered slot names.

extract_knowledge_facts(text) (standalone function)

Knowledge-based inference extractor using verb ontology and entity taxonomy. Returns List[KnowledgeFact] with inferred relationships.

VerificationReport

  • passed: bool — did verification pass?
  • corrected: Optional[str] — rewritten text (strict mode)
  • hallucinations: List[str] — hallucinated values
  • confidence: float — trust-weighted confidence (0.0-1.0)
  • contradiction_details: List[ContradictionDetail] — full conflict info
  • requires_disclosure: bool — must the response acknowledge conflicts?

Memory

  • id: str — unique identifier
  • text: str — memory content
  • trust: float — trust score (0.0-1.0, default 1.0)
  • timestamp: Optional[int] — when this was stored

ContradictionDetail

  • slot: str — which fact slot conflicts
  • values: List[str] — conflicting values
  • most_trusted_value — value from highest-trust memory
  • most_recent_value — value from most recent memory

MCP Server (Agent Integration)

GroundCheck ships with an MCP server that gives any AI agent persistent fact memory with contradiction detection. Works with VS Code Copilot, Claude Desktop, Cursor, and any MCP-compatible client.

pip install groundcheck[mcp]

Add to your config (VS Code .vscode/mcp.json, Claude claude_desktop_config.json, etc.):

{
  "servers": {
    "groundcheck": {
      "command": "groundcheck-mcp",
      "args": ["--db", ".groundcheck/memory.db", "--namespace", "my-project"]
    }
  }
}

Five tools are exposed:

Tool When to call What it does
groundcheck_store User states a fact Stores with trust score, detects contradictions
groundcheck_check Start of every turn Returns relevant memories, auto-learns from context
groundcheck_verify Before sending a response Catches hallucinations, auto-corrects, scores confidence
groundcheck_list Inspecting memory state Lists all stored memories with trust scores
groundcheck_delete Removing outdated facts Deletes specific memories or clears thread/namespace

Memories are scoped by namespace so each project gets its own memory. User-level facts stored with namespace='global' are visible across all projects.

Full MCP setup guide →

Development

git clone https://github.com/blockhead22/GroundCheck.git
cd GroundCheck
python -m venv .venv
.venv\Scripts\activate  # or source .venv/bin/activate
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

groundcheck-1.0.0.tar.gz (90.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

groundcheck-1.0.0-py3-none-any.whl (64.0 kB view details)

Uploaded Python 3

File details

Details for the file groundcheck-1.0.0.tar.gz.

File metadata

  • Download URL: groundcheck-1.0.0.tar.gz
  • Upload date:
  • Size: 90.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for groundcheck-1.0.0.tar.gz
Algorithm Hash digest
SHA256 679161e42b2344f93af322138b2a94543adbcd80a6cafded9d9ee920af1bfebd
MD5 3529b479f2942318bf0d11c279c1b22d
BLAKE2b-256 4e952178955dfbbc3c84b53b6289e2320299fb1b0586ad77a532688bec498fcc

See more details on using hashes here.

Provenance

The following attestation bundles were made for groundcheck-1.0.0.tar.gz:

Publisher: workflow.yml on blockhead22/GroundCheck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file groundcheck-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: groundcheck-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 64.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for groundcheck-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a601162ed7ed0e70ba8d1e9bd0409c76461cd487997457a126b02ac840b795c3
MD5 0106fb76e03f0a0de158837e9ec720b2
BLAKE2b-256 22c33dcd3c2f8edbd6e273470c8905f7d29cd1f6a12476bf6f328ca04daa7cb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for groundcheck-1.0.0-py3-none-any.whl:

Publisher: workflow.yml on blockhead22/GroundCheck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page