Skip to main content

LLM-agnostic deterministic assert layer for AI agent outputs

Project description

groundguard

Your LLM will hallucinate. groundguard catches it before it ships.

A drop-in verification layer that checks whether AI-generated text is actually supported by your source documents — before it reaches users.

pip install groundguard

The problem

RAG pipelines, document Q&A, and agentic workflows all share one failure mode: the model confidently states something that isn't in the source. You won't know until a lawyer, a clinician, or a customer finds it.

groundguard sits between your pipeline output and your application. It reads your source documents, checks the claim, and returns a structured verdict — so you can gate on it, log it, or retry.


Quickstart

from groundguard import verify
from groundguard.models.result import Source

sources = [
    Source(content="Q3 revenue was $5 million.", source_id="report.pdf"),
]

result = verify(
    claim="Revenue was $5M in Q3.",
    sources=sources,
    model="gpt-4o-mini",
)

assert result.is_valid          # True
print(result.status)            # "VERIFIED"
print(result.factual_consistency_score)  # 0.0–1.0

That's it. No pipeline changes, no new infrastructure. Pass in your source documents and the claim to check — groundguard does the rest.


What it returns

@dataclass
class AtomicClaimResult:
    claim: str
    status: Literal["VERIFIED", "CONTRADICTED", "UNVERIFIABLE", "PARSE_ERROR"]
    factual_consistency_score: float   # 0.0–1.0
    verification_method: str           # "tier2_lexical" | "tier3_llm"
    citation: Citation | None          # excerpt + char offsets; always set on VERIFIED
    is_valid: bool

@dataclass
class VerificationResult:
    is_valid: bool
    status: Literal["VERIFIED", "CONTRADICTED", "UNVERIFIABLE",
                    "PARSE_ERROR", "SKIPPED_DUE_TO_COST", "ERROR"]
    atomic_claims: list[AtomicClaimResult]
    factual_consistency_score: float
    sources_used: list[str]            # source_ids that supported the verdict
    rationale: str
    offending_claim: str | None        # first CONTRADICTED sentence
    total_cost_usd: float

Every VERIFIED result has a non-null citation with the exact excerpt and character offsets. CONTRADICTED results include the offending sentence. UNVERIFIABLE means the source documents don't have enough information to decide — not that the claim is false.


Core API

Single claim

from groundguard import verify, averify

# Synchronous
result = verify(claim="...", sources=[...], model="gpt-4o-mini")

# Async — BM25 dispatched to executor, non-blocking
result = await averify(claim="...", sources=[...], model="gpt-4o-mini")

Full answer grounding (verify_answer)

Check whether a complete LLM response is supported by the retrieved documents:

from groundguard import verify_answer, averify_answer

result = verify_answer(
    output="The acquisition closed in Q2 and the FTC approved it unconditionally.",
    sources=retrieved_docs,
    model="gpt-4o-mini",
)
# result.is_grounded, result.score, result.status

Multi-claim analysis (verify_analysis)

Extract claims from a longer text and verify each one independently:

from groundguard import verify_analysis

result = verify_analysis(
    analysis="Revenue grew 30% YoY. Headcount declined by 12%.",
    sources=company_filings,
    model="gpt-4o-mini",
)
# supported / (supported + contradicted) — UNVERIFIABLE excluded from denominator

Legal clause verification (verify_clause)

Decompose a contract clause into obligations, then verify each term against a source:

from groundguard import verify_clause
from groundguard.loaders.legal import TermRegistry

registry = TermRegistry()
registry.define("Effective Date", "January 1, 2025")

result = verify_clause(
    clause="Payment is due within 30 days of the Effective Date.",
    sources=contract_sources,
    term_registry=registry,
)

Uses STRICT_PROFILE by default (faithfulness threshold 0.97, majority vote on, full audit trail).

Batch verification

from groundguard import averify_batch
from groundguard.models.internal import ClaimInput

inputs = [
    ClaimInput(claim="Claim A", sources=[...]),
    ClaimInput(claim="Claim B", sources=[...], model="claude-3-haiku-20240307"),
]

results = averify_batch(inputs=inputs, model="gpt-4o-mini", max_concurrency=5, max_spend=2.0)

Fail-contained: individual failures do not abort the batch. Items that hit the spend cap return status="SKIPPED_DUE_TO_COST".


Verification profiles

Tune strictness to your domain with a single parameter:

from groundguard import verify_answer, STRICT_PROFILE, GENERAL_PROFILE, RESEARCH_PROFILE

# Legal / compliance / clinical — 0.97 threshold, majority vote, full audit trail
result = verify_answer(output, sources, profile=STRICT_PROFILE, model="gpt-4o-mini")

# General RAG / product — 0.80 threshold, single call, no audit
result = verify_answer(output, sources, profile=GENERAL_PROFILE, model="gpt-4o-mini")

# Research / exploratory — permissive threshold, wider BM25 retrieval
result = verify_answer(output, sources, profile=RESEARCH_PROFILE, model="gpt-4o-mini")
Profile Threshold Majority vote Audit trail Top-k chunks
STRICT_PROFILE 0.97 Yes (3 calls) Full 6
GENERAL_PROFILE 0.80 No Off 3
RESEARCH_PROFILE 0.70 No Off 4

Custom profiles are a frozen dataclass — create your own in one line.


Production patterns

Assert and raise

from groundguard import assert_faithful, assert_grounded, GroundingError

# Raises GroundingError if output is not grounded
assert_faithful(llm_output, sources, model="gpt-4o-mini")

# Raises GroundingError if analysis contains unsupported claims
assert_grounded(analysis_text, sources, model="gpt-4o-mini")

Retry until grounded

from groundguard import verify_or_retry

verified_output = verify_or_retry(
    generator=lambda: chain.invoke({"query": "What was Q3 revenue?"}),
    sources=retrieved_docs,
    max_retries=3,
    model="gpt-4o-mini",
)

Spend cap

result = verify(
    claim="...",
    sources=[...],
    model="gpt-4o-mini",
    max_spend=0.50,   # soft USD cap — triggering call completes, then raises
)

How it works

Every verify() call runs a 4-tier pipeline with no extra latency when the answer is obvious:

Tier What it does Cost
0 — Classifier Splits claim into atomic sentences Zero
1 — Fuzzy gate Confirms agent-cited evidence appears in source Zero
2 — BM25 router Lexical similarity check — skips LLM if score ≥ 0.85 Zero
3 — LLM evaluator Sends top-k chunks + claim; Pydantic-validated verdict LLM call

Near-verbatim quotes never hit the LLM. Paraphrases and inferred claims do. The router decides automatically.

Large-context models

result = verify(
    claim="...",
    sources=[Source(content=long_document, source_id="contract.pdf")],
    model="gemini-2.0-flash",
    auto_chunk=False,   # pass full source — no BM25 chunking
)

Use auto_chunk=False when your model can fit the full document. By default, BM25 retrieves isolated chunks by keyword relevance — a negating clause in a low-scoring chunk (e.g., "that acquisition was subsequently blocked by the FTC") may not reach Tier 3. auto_chunk=False eliminates this risk for large-context models.


Local LLMs (no cloud required)

ollama pull qwen3:14b   # ~9GB, fits in 15GB RAM
ollama serve
result = verify(
    claim="...",
    sources=[...],
    model="ollama/qwen3:14b",
    max_spend=0.0,   # local — free
)

Works out of the box — litellm handles the Ollama transport. Thinking-mode models (qwen3, DeepSeek-R1, etc.) are fully supported; <think> tags are stripped automatically.


Source accumulation

When multiple agents or retrieval steps contribute sources, deduplicate and track provenance before verifying:

from groundguard import verify_analysis
from groundguard.loaders.accumulator import SourceAccumulator

acc = SourceAccumulator()
acc.add(db_source, provenance="database_lookup", agent_id="agent_1")
acc.add(llm_source, provenance="llm_generated", is_llm_derived=True, agent_id="agent_2")

result = verify_analysis(agent_output, sources=acc.sources(), model="gpt-4o-mini")

acc.sources() returns a deduplicated list[Source] with provenance metadata. Inspect or filter before passing to any verification function.


Cost estimation

Estimate cost before running, not after:

from groundguard import estimate_verify_faithfulness_cost, estimate_verify_analysis_cost, CostEstimate

estimate: CostEstimate = estimate_verify_faithfulness_cost(
    output="The report shows revenue of $5M for Q3.",
    sources=retrieved_docs,
    model="gpt-4o-mini",
    return_breakdown=True,
)

print(f"Estimated: ${estimate.total_cost_usd:.4f}")

Use by domain

Legal / compliance — verify contract clause obligations against source agreements using verify_clause + TermRegistry. Pinned term definitions are injected as sources automatically.

Financialverify_analysis extracts and checks every numerical claim in a generated report. Tier 2.5 catches numeric conflicts (unit mismatches, magnitude errors, off-by-one) before the LLM call.

Clinical / medical — use STRICT_PROFILE for 0.97 faithfulness threshold and majority vote. Every sentence result gets a VerificationAuditRecord with the raw LLM output, confidence, and grounding source ID.

RAG pipelinesassert_faithful as a post-retrieval gate: if the generated answer isn't supported by the retrieved chunks, raise before the response is returned to the user.


What groundguard is not

  • A RAG pipeline or retrieval system (you provide the source documents)
  • A web scraper or URL fetcher (sources must be developer-provided text)
  • A general-purpose question-answering system
  • A model fine-tuning or evaluation framework

Optional extras

pip install groundguard[loaders]    # PDF and DOCX ingestion
pip install groundguard[langchain]  # LangChain callback integration
pip install groundguard[dev]        # pytest, coverage, asyncio test utilities

Ingest PDFs and DOCX

from groundguard.loaders.helpers import pdf_to_text, docx_to_text

text = pdf_to_text("contract.pdf")
sources = [Source(content=text, source_id="contract.pdf")]
result = verify_clause(clause="...", sources=sources)

LangChain integration

from groundguard.integrations.langchain import AgenticVerifierCallback

callback = AgenticVerifierCallback(model="gpt-4o-mini")
chain.invoke({"query": "What was Q3 revenue?"}, callbacks=[callback])
# Raises GroundingError if the chain output cannot be verified against source_documents

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

groundguard-0.1.0.tar.gz (103.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

groundguard-0.1.0-py3-none-any.whl (50.2 kB view details)

Uploaded Python 3

File details

Details for the file groundguard-0.1.0.tar.gz.

File metadata

  • Download URL: groundguard-0.1.0.tar.gz
  • Upload date:
  • Size: 103.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for groundguard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4785a4fe70827fb8b169909a85fbed3cc732757b1694de503895dd20411288dc
MD5 35266057c5f4cc1c471354bb82b25adf
BLAKE2b-256 9eb1cc93796e3f9be4dba7a346a272228615243fd656333cfbd132c0a89cb0e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for groundguard-0.1.0.tar.gz:

Publisher: release.yml on pulkitj/groundguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file groundguard-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: groundguard-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for groundguard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5594ff40fa3c417cb890ae8d34714b24c385f675a1b7b11367d8b75cf903e25b
MD5 b556160286945d11548d21aeca4438b7
BLAKE2b-256 ed3da4f29b32ecb5c15544ec3a4aa02b7b6daf06910b2d67ac4d59a9d89b671f

See more details on using hashes here.

Provenance

The following attestation bundles were made for groundguard-0.1.0-py3-none-any.whl:

Publisher: release.yml on pulkitj/groundguard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page