Skip to main content

State-contracted verification and explainable retrieval gating for agentic RAG.

Project description

ContextGuard: State-Contracted Verification for Agentic RAG

CI Docs PyPI License: MIT

ContextGuard is a text-only verification and consistency engine. It treats multi-turn RAG and fact-checking like a compiler:

  • StateSpec = your constraint contract (entities, time, metric, units, source policy).
  • Planner = builds support + counter-evidence queries.
  • Gate = hard admission control (reject wrong-year/entity/source chunks with reason codes).
  • Judge = support/contradict scoring for each claim–evidence pair.
  • Aggregate = per-claim + overall verdicts with confidence.
  • Trace DAG = micrograd-style execution graph for full explainability.
  • Report = SUPPORTED / CONTRADICTED / INSUFFICIENT / MIXED + citations.

Why this exists

Multi-turn RAG fails because similarity ≠ relevance under constraints. Benchmarks like MTRAG/CORAL (multi-turn drift) and FEVER/SciFact (evidence-required verification) show strong systems still pull wrong-year/entity/source chunks and answer confidently. ContextGuard fixes this by making constraints first-class and rejecting ineligible evidence before generation.

What’s included (v0.1)

  • Core contracts: StateSpec, Claim, Chunk, Verdict, ReasonCode.
  • Merge engine: carryover + reset semantics with conflict detection.
  • Planner: coverage-first retrieval with mandatory counter-evidence queries.
  • Gate: hard constraint checks (entity, time, source policy), diversity control, noise filtering, reason codes.
  • Judges: rule-based, LLM-based, and NLI-ready interfaces for support/contradict scoring.
  • Aggregation: per-claim + overall verdict logic with confidence and coverage signals.
  • Reports: JSON/Markdown/HTML rendering, plus a facts-first context pack for safe RAG generation.
  • Trace DAG: micrograd-style execution graph; export to Graphviz DOT/SVG.
  • Storage: SQLite-backed state/fact/run store (zero-ops).
  • Hero demo: examples/05_trace_graphviz.py generates a report + DOT trace.
  • Resilience: Retry/budget and circuit-breaker wrappers for LLM providers/retrievers; optional async retriever support; dedup/rerank helpers.

Quick start (local)

Clone the repo and run the hero demo (uses only standard lib + pydantic).

cd contextguard
python examples/05_trace_graphviz.py

Outputs (in examples/output/):

  • report.md — verdict report with citations and stats.
  • trace.dot / trace.svg — Graphviz diagram of the full decision DAG.

Install (when published)

Standard install (runtime only):

pip install llm-contextguard

From source:

pip install -e .

Optional extras:

pip install llm-contextguard[demo]        # graphviz for DOT->SVG/PNG rendering
pip install llm-contextguard[nli]         # sentence-transformers for NLIJudge
pip install llm-contextguard[dev]         # ruff + mypy + pytest

Programmatic use

Minimal end-to-end flow (rule-based components):

from contextguard import (
    StateSpec, StateDelta, EntityRef, TimeConstraint,
    merge_state, plan_retrieval, gate_chunks,
    RuleBasedClaimSplitter, RuleBasedJudge,
    ClaimAggregator, OverallAggregator, build_report
)

# 1) Start state and merge user constraints
state = StateSpec(thread_id="t1")
delta = StateDelta(
    entities_add=[EntityRef(entity_id="AAPL")],
    time=TimeConstraint(year=2024),
    metric="revenue",
)
merge_result = merge_state(state, delta, turn_id=1)
state = merge_result.state

# 2) Split claims (rule-based or LLM)
claims = RuleBasedClaimSplitter().split("Apple 2024 revenue will be $400B.")

# 3) Plan retrieval (support + counter)
plan = plan_retrieval(claims, state, total_k=20)

# 4) Retrieve with your own retriever implementing `Retriever.search()`
#    Here, you would call your backend and get `Chunk` objects back.
#    chunks = my_retriever.search(...)

# 5) Gate evidence (hard constraints)
# gated = gate_chunks(chunks, state)

# 6) Judge + aggregate
# judge_results = RuleBasedJudge().score_batch(claims[0], accepted_chunks, state)
# claim_verdict = ClaimAggregator().aggregate(claims[0], judge_results)
# overall_label, overall_conf, warnings = OverallAggregator().aggregate([claim_verdict])

# 7) Build report
# report = build_report(thread_id="t1", state=state,
#                       claim_verdicts=[claim_verdict],
#                       overall_label=overall_label,
#                       overall_confidence=overall_conf)

Key concepts

  • StateSpec: persistent constraints (entities, time, metric, units, source policy). This is the “contract” that gates retrieval.
  • Planner: issues both support and counter-evidence queries to avoid confirmation bias.
  • Gate: rejects chunks that violate constraints; enforces diversity; emits reason codes.
  • Judge: scores claim–evidence pairs for support/contradiction; LLM or rule-based/NLI.
  • Aggregate: decides SUPPORTED / CONTRADICTED / INSUFFICIENT / MIXED with coverage-aware confidence.
  • Trace DAG: every step is recorded; exportable to Graphviz for “show me why this fact got in.”

Hero demo (recommended)

python examples/05_trace_graphviz.py

  • Simulates a 3-turn conversation:
    • “Compare Apple and Microsoft revenue”
    • “Now do 2024 projections”
    • “Only use primary sources”
  • Demonstrates constraint carryover, gating, counter-evidence, verdicts, and trace visualization.

Quick eval harness (smoke)

A tiny JSONL fixture is provided: tests/fixtures/eval.jsonl

Run baseline:

python -m contextguard.eval.harness --data tests/fixtures/eval.jsonl --k 5

Ablations:

  • Disable gating: --disable-gating
  • Disable counter-evidence: --disable-counter

Example (no gating, no counter):

python -m contextguard.eval.harness --data tests/fixtures/eval.jsonl --k 5 --disable-gating --disable-counter

Results (placeholder – fill with real numbers in CI):

  • verdict_accuracy: …
  • evidence_precision/recall: …
  • fever_score: …

Adapters & extensibility

  • Retrievers: implement Retriever.search(query, filters, k) -> List[Chunk] for any vector DB / search backend.
  • Provided adapters:
    • LangChainRetrieverAdapter — wrap any LangChain retriever; override doc_to_chunk or subclass to customize provenance/metadata and filter matching.
    • LlamaIndexRetrieverAdapter — wrap any LlamaIndex retriever/query engine; override node_to_chunk or subclass for richer metadata handling.
  • Judges: plug your own LLM via LLMJudge (structured JSON prompts) or an NLI model via NLIJudge.
  • LLM providers: OpenAIProvider implements the LLMProvider protocol; override build_messages or wrap with your own retry/guard layers. RetryingProvider decorates any provider with backoff + logging (strategy/decorator pattern).
  • Budgets: BudgetedProvider enforces prompt/output limits before calling the underlying provider (pair with RetryingProvider).
  • Generation (optional): LLMGenerator turns a ContextPack + user prompt into a guarded JSON answer using any LLMProvider. Override build_prompt / build_schema or implement the Generator protocol for domain-specific pipelines.
  • Stores: SQLite by default; S3Store is provided for S3-compatible buckets; add Postgres/Redis by implementing the Store protocol.
  • Async pipeline: async_run_verification runs plan → retrieve → gate → judge → aggregate with asyncio (wrapping sync retrievers/judges via threadpool).
  • Frameworks: LangChain/LlamaIndex adapters are provided; wrap your retriever to feed Chunk objects.

Docs

  • Built with MkDocs + mkdocstrings. To serve locally:
pip install llm-contextguard[docs]
mkdocs serve

How to integrate a retriever (metadata expectations)

  • Implement Retriever.search(query, filters, k) and return Chunk objects with provenance.source_id and provenance.source_type (PRIMARY/SECONDARY/TERTIARY). Default source policy rejects TERTIARY.
  • Fill structured metadata: chunk.entity_ids (list of canonical IDs), chunk.year (int), and metadata.doc_type if available. Gating relies on entity_ids, year, and source_type; aggregation gives higher weight to primary contradictions.
  • Translate filters: use CanonicalFilters.from_state_spec(state) to map to your backend (entity/year/source filters). Respect filters.allowed_source_types, filters.year, and filters.entity_ids.
  • Domain/profile strictness: GatingConfig.from_profile(...) and AggregationConfig.from_profile(...) tighten rules (e.g., finance prefers primary + >=2 sources; policy expects primary; enterprise is moderate). Pass the profile to planner/gate/aggregate if you want domain-tuned behavior.
  • Provenance timestamps: set provenance.retrieved_at (ISO, timezone-aware) and provenance.chunk_id if you have stable chunk IDs; they improve reproducibility and trace output.

What’s next (roadmap)

  • Eval harness on FEVER/SciFact (and multi-turn sets like MTRAG/CORAL).
  • Domain profiles (finance/news/policy/enterprise) with pre-tuned gating thresholds.
  • Confidence calibration and better rationale spans.
  • CI + release automation to PyPI/TestPyPI on tagged releases.

License

MIT — see LICENSE.

Contacts

Contributors welcome. Open issues/PRs with trace screenshots and repro steps.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_contextguard-0.1.0.tar.gz (178.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_contextguard-0.1.0-py3-none-any.whl (101.5 kB view details)

Uploaded Python 3

File details

Details for the file llm_contextguard-0.1.0.tar.gz.

File metadata

  • Download URL: llm_contextguard-0.1.0.tar.gz
  • Upload date:
  • Size: 178.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_contextguard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 130c7114077120e39292dd4d0cf5120694d6a348c09cd5002935d89c053eb147
MD5 f1d29f53ab0660c20752d456e5e215a6
BLAKE2b-256 a36a85893a3da997b7e8565886818da53593a2e2b2a7260945ca9771cf2ae8b4

See more details on using hashes here.

File details

Details for the file llm_contextguard-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_contextguard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4eb71616cda9241faa60814b88cca3ca56adaa792dc4dc5c2996dbb9aaadad9c
MD5 d28ab822e2acf0cb2614bd7424c6e6c0
BLAKE2b-256 e98451678d7a4a7adc095256b3944cacd6c0bb1b6aeabced6337aa965837dc71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page