State-contracted verification and explainable retrieval gating for agentic RAG.
Project description
ContextGuard: State-Contracted Verification for Agentic RAG
ContextGuard is a text-only verification and consistency engine. It treats multi-turn RAG and fact-checking like a compiler:
- StateSpec = your constraint contract (entities, time, metric, units, source policy).
- Planner = builds support + counter-evidence queries.
- Gate = hard admission control (reject wrong-year/entity/source chunks with reason codes).
- Judge = support/contradict scoring for each claim–evidence pair.
- Aggregate = per-claim + overall verdicts with confidence.
- Trace DAG = micrograd-style execution graph for full explainability.
- Report = SUPPORTED / CONTRADICTED / INSUFFICIENT / MIXED + citations.
Why this exists
Multi-turn RAG fails because similarity ≠ relevance under constraints. Benchmarks like MTRAG/CORAL (multi-turn drift) and FEVER/SciFact (evidence-required verification) show strong systems still pull wrong-year/entity/source chunks and answer confidently. ContextGuard fixes this by making constraints first-class and rejecting ineligible evidence before generation.
What’s included (v0.1)
- Core contracts:
StateSpec,Claim,Chunk,Verdict,ReasonCode. - Merge engine: carryover + reset semantics with conflict detection.
- Planner: coverage-first retrieval with mandatory counter-evidence queries.
- Gate: hard constraint checks (entity, time, source policy), diversity control, noise filtering, reason codes.
- Judges: rule-based, LLM-based, and NLI-ready interfaces for support/contradict scoring.
- Aggregation: per-claim + overall verdict logic with confidence and coverage signals.
- Reports: JSON/Markdown/HTML rendering, plus a facts-first context pack for safe RAG generation.
- Trace DAG: micrograd-style execution graph; export to Graphviz DOT/SVG.
- Storage: SQLite-backed state/fact/run store (zero-ops).
- Hero demo:
examples/05_trace_graphviz.pygenerates a report + DOT trace. - Resilience: Retry/budget and circuit-breaker wrappers for LLM providers/retrievers; optional async retriever support; dedup/rerank helpers.
Quick start (local)
Clone the repo and run the hero demo (uses only standard lib + pydantic).
cd contextguard
python examples/05_trace_graphviz.py
Outputs (in examples/output/):
report.md— verdict report with citations and stats.trace.dot/trace.svg— Graphviz diagram of the full decision DAG.
Install (when published)
Standard install (runtime only):
pip install llm-contextguard
From source:
pip install -e .
Optional extras:
pip install llm-contextguard[demo] # graphviz for DOT->SVG/PNG rendering
pip install llm-contextguard[nli] # sentence-transformers for NLIJudge
pip install llm-contextguard[dev] # ruff + mypy + pytest
Programmatic use
Minimal end-to-end flow (rule-based components):
from contextguard import (
StateSpec, StateDelta, EntityRef, TimeConstraint,
merge_state, plan_retrieval, gate_chunks,
RuleBasedClaimSplitter, RuleBasedJudge,
ClaimAggregator, OverallAggregator, build_report
)
# 1) Start state and merge user constraints
state = StateSpec(thread_id="t1")
delta = StateDelta(
entities_add=[EntityRef(entity_id="AAPL")],
time=TimeConstraint(year=2024),
metric="revenue",
)
merge_result = merge_state(state, delta, turn_id=1)
state = merge_result.state
# 2) Split claims (rule-based or LLM)
claims = RuleBasedClaimSplitter().split("Apple 2024 revenue will be $400B.")
# 3) Plan retrieval (support + counter)
plan = plan_retrieval(claims, state, total_k=20)
# 4) Retrieve with your own retriever implementing `Retriever.search()`
# Here, you would call your backend and get `Chunk` objects back.
# chunks = my_retriever.search(...)
# 5) Gate evidence (hard constraints)
# gated = gate_chunks(chunks, state)
# 6) Judge + aggregate
# judge_results = RuleBasedJudge().score_batch(claims[0], accepted_chunks, state)
# claim_verdict = ClaimAggregator().aggregate(claims[0], judge_results)
# overall_label, overall_conf, warnings = OverallAggregator().aggregate([claim_verdict])
# 7) Build report
# report = build_report(thread_id="t1", state=state,
# claim_verdicts=[claim_verdict],
# overall_label=overall_label,
# overall_confidence=overall_conf)
Key concepts
- StateSpec: persistent constraints (entities, time, metric, units, source policy). This is the “contract” that gates retrieval.
- Planner: issues both support and counter-evidence queries to avoid confirmation bias.
- Gate: rejects chunks that violate constraints; enforces diversity; emits reason codes.
- Judge: scores claim–evidence pairs for support/contradiction; LLM or rule-based/NLI.
- Aggregate: decides SUPPORTED / CONTRADICTED / INSUFFICIENT / MIXED with coverage-aware confidence.
- Trace DAG: every step is recorded; exportable to Graphviz for “show me why this fact got in.”
Hero demo (recommended)
python examples/05_trace_graphviz.py
- Simulates a 3-turn conversation:
- “Compare Apple and Microsoft revenue”
- “Now do 2024 projections”
- “Only use primary sources”
- Demonstrates constraint carryover, gating, counter-evidence, verdicts, and trace visualization.
Quick eval harness (smoke)
A tiny JSONL fixture is provided: tests/fixtures/eval.jsonl
Run baseline:
python -m contextguard.eval.harness --data tests/fixtures/eval.jsonl --k 5
Ablations:
- Disable gating:
--disable-gating - Disable counter-evidence:
--disable-counter
Example (no gating, no counter):
python -m contextguard.eval.harness --data tests/fixtures/eval.jsonl --k 5 --disable-gating --disable-counter
Results (placeholder – fill with real numbers in CI):
- verdict_accuracy: …
- evidence_precision/recall: …
- fever_score: …
Adapters & extensibility
- Retrievers: implement
Retriever.search(query, filters, k) -> List[Chunk]for any vector DB / search backend. - Provided adapters:
LangChainRetrieverAdapter— wrap any LangChain retriever; overridedoc_to_chunkor subclass to customize provenance/metadata and filter matching.LlamaIndexRetrieverAdapter— wrap any LlamaIndex retriever/query engine; overridenode_to_chunkor subclass for richer metadata handling.
- Judges: plug your own LLM via
LLMJudge(structured JSON prompts) or an NLI model viaNLIJudge. - LLM providers:
OpenAIProviderimplements theLLMProviderprotocol; overridebuild_messagesor wrap with your own retry/guard layers.RetryingProviderdecorates any provider with backoff + logging (strategy/decorator pattern). - Budgets:
BudgetedProviderenforces prompt/output limits before calling the underlying provider (pair withRetryingProvider). - Generation (optional):
LLMGeneratorturns aContextPack+ user prompt into a guarded JSON answer using anyLLMProvider. Overridebuild_prompt/build_schemaor implement theGeneratorprotocol for domain-specific pipelines. - Stores: SQLite by default;
S3Storeis provided for S3-compatible buckets; add Postgres/Redis by implementing theStoreprotocol. - Async pipeline:
async_run_verificationruns plan → retrieve → gate → judge → aggregate with asyncio (wrapping sync retrievers/judges via threadpool). - Frameworks: LangChain/LlamaIndex adapters are provided; wrap your retriever to feed
Chunkobjects.
Docs
- Built with MkDocs + mkdocstrings. To serve locally:
pip install llm-contextguard[docs]
mkdocs serve
How to integrate a retriever (metadata expectations)
- Implement
Retriever.search(query, filters, k)and returnChunkobjects with provenance.source_id and provenance.source_type (PRIMARY/SECONDARY/TERTIARY). Default source policy rejectsTERTIARY. - Fill structured metadata:
chunk.entity_ids(list of canonical IDs),chunk.year(int), andmetadata.doc_typeif available. Gating relies onentity_ids,year, andsource_type; aggregation gives higher weight to primary contradictions. - Translate filters: use
CanonicalFilters.from_state_spec(state)to map to your backend (entity/year/source filters). Respectfilters.allowed_source_types,filters.year, andfilters.entity_ids. - Domain/profile strictness:
GatingConfig.from_profile(...)andAggregationConfig.from_profile(...)tighten rules (e.g., finance prefers primary + >=2 sources; policy expects primary; enterprise is moderate). Pass the profile to planner/gate/aggregate if you want domain-tuned behavior. - Provenance timestamps: set
provenance.retrieved_at(ISO, timezone-aware) andprovenance.chunk_idif you have stable chunk IDs; they improve reproducibility and trace output.
What’s next (roadmap)
- Eval harness on FEVER/SciFact (and multi-turn sets like MTRAG/CORAL).
- Domain profiles (finance/news/policy/enterprise) with pre-tuned gating thresholds.
- Confidence calibration and better rationale spans.
- CI + release automation to PyPI/TestPyPI on tagged releases.
License
MIT — see LICENSE.
Contacts
Contributors welcome. Open issues/PRs with trace screenshots and repro steps.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_contextguard-0.1.0.tar.gz.
File metadata
- Download URL: llm_contextguard-0.1.0.tar.gz
- Upload date:
- Size: 178.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
130c7114077120e39292dd4d0cf5120694d6a348c09cd5002935d89c053eb147
|
|
| MD5 |
f1d29f53ab0660c20752d456e5e215a6
|
|
| BLAKE2b-256 |
a36a85893a3da997b7e8565886818da53593a2e2b2a7260945ca9771cf2ae8b4
|
File details
Details for the file llm_contextguard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_contextguard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 101.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eb71616cda9241faa60814b88cca3ca56adaa792dc4dc5c2996dbb9aaadad9c
|
|
| MD5 |
d28ab822e2acf0cb2614bd7424c6e6c0
|
|
| BLAKE2b-256 |
e98451678d7a4a7adc095256b3944cacd6c0bb1b6aeabced6337aa965837dc71
|