State-contracted verification and explainable retrieval gating for agentic RAG.

These details have not been verified by PyPI

Project links

Project description

ContextGuard: State-Contracted Verification for Agentic RAG

ContextGuard is a text-only verification and consistency engine. It treats multi-turn RAG and fact-checking like a compiler:

StateSpec = your constraint contract (entities, time, metric, units, source policy).
Planner = builds support + counter-evidence queries.
Gate = hard admission control (reject wrong-year/entity/source chunks with reason codes).
Judge = support/contradict scoring for each claim–evidence pair.
Aggregate = per-claim + overall verdicts with confidence.
Trace DAG = micrograd-style execution graph for full explainability.
Report = SUPPORTED / CONTRADICTED / INSUFFICIENT / MIXED + citations.

Why this exists

Multi-turn RAG fails because similarity ≠ relevance under constraints. Benchmarks like MTRAG/CORAL (multi-turn drift) and FEVER/SciFact (evidence-required verification) show strong systems still pull wrong-year/entity/source chunks and answer confidently. ContextGuard fixes this by making constraints first-class and rejecting ineligible evidence before generation.

What’s included (v0.1)

Core contracts: StateSpec, Claim, Chunk, Verdict, ReasonCode.
Merge engine: carryover + reset semantics with conflict detection.
Planner: coverage-first retrieval with mandatory counter-evidence queries.
Gate: hard constraint checks (entity, time, source policy), diversity control, noise filtering, reason codes.
Judges: rule-based, LLM-based, and NLI-ready interfaces for support/contradict scoring.
Aggregation: per-claim + overall verdict logic with confidence and coverage signals.
Reports: JSON/Markdown/HTML rendering, plus a facts-first context pack for safe RAG generation.
Trace DAG: micrograd-style execution graph; export to Graphviz DOT/SVG.
Storage: SQLite-backed state/fact/run store (zero-ops).
Hero demo: examples/05_trace_graphviz.py generates a report + DOT trace.
Resilience: Retry/budget and circuit-breaker wrappers for LLM providers/retrievers; optional async retriever support; dedup/rerank helpers.

Quick start (local)

Clone the repo and run the hero demo (uses only standard lib + pydantic).

cd contextguard
python examples/05_trace_graphviz.py

Outputs (in examples/output/):

report.md — verdict report with citations and stats.
trace.dot / trace.svg — Graphviz diagram of the full decision DAG.

Install (when published)

Standard install (runtime only):

pip install llm-contextguard

From source:

pip install -e .

Optional extras:

pip install llm-contextguard[demo]        # graphviz for DOT->SVG/PNG rendering
pip install llm-contextguard[nli]         # sentence-transformers for NLIJudge
pip install llm-contextguard[dev]         # ruff + mypy + pytest

Programmatic use

Minimal end-to-end flow (rule-based components):

from contextguard import (
    StateSpec, StateDelta, EntityRef, TimeConstraint,
    merge_state, plan_retrieval, gate_chunks,
    RuleBasedClaimSplitter, RuleBasedJudge,
    ClaimAggregator, OverallAggregator, build_report
)

# 1) Start state and merge user constraints
state = StateSpec(thread_id="t1")
delta = StateDelta(
    entities_add=[EntityRef(entity_id="AAPL")],
    time=TimeConstraint(year=2024),
    metric="revenue",
)
merge_result = merge_state(state, delta, turn_id=1)
state = merge_result.state

# 2) Split claims (rule-based or LLM)
claims = RuleBasedClaimSplitter().split("Apple 2024 revenue will be $400B.")

# 3) Plan retrieval (support + counter)
plan = plan_retrieval(claims, state, total_k=20)

# 4) Retrieve with your own retriever implementing `Retriever.search()`
#    Here, you would call your backend and get `Chunk` objects back.
#    chunks = my_retriever.search(...)

# 5) Gate evidence (hard constraints)
# gated = gate_chunks(chunks, state)

# 6) Judge + aggregate
# judge_results = RuleBasedJudge().score_batch(claims[0], accepted_chunks, state)
# claim_verdict = ClaimAggregator().aggregate(claims[0], judge_results)
# overall_label, overall_conf, warnings = OverallAggregator().aggregate([claim_verdict])

# 7) Build report
# report = build_report(thread_id="t1", state=state,
#                       claim_verdicts=[claim_verdict],
#                       overall_label=overall_label,
#                       overall_confidence=overall_conf)

Key concepts

StateSpec: persistent constraints (entities, time, metric, units, source policy). This is the “contract” that gates retrieval.
Planner: issues both support and counter-evidence queries to avoid confirmation bias.
Gate: rejects chunks that violate constraints; enforces diversity; emits reason codes.
Judge: scores claim–evidence pairs for support/contradiction; LLM or rule-based/NLI.
Aggregate: decides SUPPORTED / CONTRADICTED / INSUFFICIENT / MIXED with coverage-aware confidence.
Trace DAG: every step is recorded; exportable to Graphviz for “show me why this fact got in.”

Hero demo (recommended)

python examples/05_trace_graphviz.py

Simulates a 3-turn conversation:
- “Compare Apple and Microsoft revenue”
- “Now do 2024 projections”
- “Only use primary sources”
Demonstrates constraint carryover, gating, counter-evidence, verdicts, and trace visualization.

Quick eval harness (smoke)

A tiny JSONL fixture is provided: tests/fixtures/eval.jsonl

Run baseline:

python -m contextguard.eval.harness --data tests/fixtures/eval.jsonl --k 5

Ablations:

Disable gating: --disable-gating
Disable counter-evidence: --disable-counter

Example (no gating, no counter):

python -m contextguard.eval.harness --data tests/fixtures/eval.jsonl --k 5 --disable-gating --disable-counter

Results (placeholder – fill with real numbers in CI):

verdict_accuracy: …
evidence_precision/recall: …
fever_score: …

Adapters & extensibility

Retrievers: implement Retriever.search(query, filters, k) -> List[Chunk] for any vector DB / search backend.
Provided adapters:
- LangChainRetrieverAdapter — wrap any LangChain retriever; override doc_to_chunk or subclass to customize provenance/metadata and filter matching.
- LlamaIndexRetrieverAdapter — wrap any LlamaIndex retriever/query engine; override node_to_chunk or subclass for richer metadata handling.
Judges: plug your own LLM via LLMJudge (structured JSON prompts) or an NLI model via NLIJudge.
LLM providers: OpenAIProvider implements the LLMProvider protocol; override build_messages or wrap with your own retry/guard layers. RetryingProvider decorates any provider with backoff + logging (strategy/decorator pattern).
Budgets: BudgetedProvider enforces prompt/output limits before calling the underlying provider (pair with RetryingProvider).
Generation (optional): LLMGenerator turns a ContextPack + user prompt into a guarded JSON answer using any LLMProvider. Override build_prompt / build_schema or implement the Generator protocol for domain-specific pipelines.
Stores: SQLite by default; S3Store is provided for S3-compatible buckets; add Postgres/Redis by implementing the Store protocol.
Async pipeline: async_run_verification runs plan → retrieve → gate → judge → aggregate with asyncio (wrapping sync retrievers/judges via threadpool).
Frameworks: LangChain/LlamaIndex adapters are provided; wrap your retriever to feed Chunk objects.

Docs

Built with MkDocs + mkdocstrings. To serve locally:

pip install llm-contextguard[docs]
mkdocs serve

How to integrate a retriever (metadata expectations)

Implement Retriever.search(query, filters, k) and return Chunk objects with provenance.source_id and provenance.source_type (PRIMARY/SECONDARY/TERTIARY). Default source policy rejects TERTIARY.
Fill structured metadata: chunk.entity_ids (list of canonical IDs), chunk.year (int), and metadata.doc_type if available. Gating relies on entity_ids, year, and source_type; aggregation gives higher weight to primary contradictions.
Translate filters: use CanonicalFilters.from_state_spec(state) to map to your backend (entity/year/source filters). Respect filters.allowed_source_types, filters.year, and filters.entity_ids.
Domain/profile strictness: GatingConfig.from_profile(...) and AggregationConfig.from_profile(...) tighten rules (e.g., finance prefers primary + >=2 sources; policy expects primary; enterprise is moderate). Pass the profile to planner/gate/aggregate if you want domain-tuned behavior.
Provenance timestamps: set provenance.retrieved_at (ISO, timezone-aware) and provenance.chunk_id if you have stable chunk IDs; they improve reproducibility and trace output.

What’s next (roadmap)

Eval harness on FEVER/SciFact (and multi-turn sets like MTRAG/CORAL).
Domain profiles (finance/news/policy/enterprise) with pre-tuned gating thresholds.
Confidence calibration and better rationale spans.
CI + release automation to PyPI/TestPyPI on tagged releases.

License

MIT — see LICENSE.

Contacts

Contributors welcome. Open issues/PRs with trace screenshots and repro steps.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jan 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_contextguard-0.1.0.tar.gz (178.0 kB view details)

Uploaded Jan 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_contextguard-0.1.0-py3-none-any.whl (101.5 kB view details)

Uploaded Jan 5, 2026 Python 3

File details

Details for the file llm_contextguard-0.1.0.tar.gz.

File metadata

Download URL: llm_contextguard-0.1.0.tar.gz
Upload date: Jan 5, 2026
Size: 178.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_contextguard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`130c7114077120e39292dd4d0cf5120694d6a348c09cd5002935d89c053eb147`
MD5	`f1d29f53ab0660c20752d456e5e215a6`
BLAKE2b-256	`a36a85893a3da997b7e8565886818da53593a2e2b2a7260945ca9771cf2ae8b4`

See more details on using hashes here.

File details

Details for the file llm_contextguard-0.1.0-py3-none-any.whl.

File metadata

Download URL: llm_contextguard-0.1.0-py3-none-any.whl
Upload date: Jan 5, 2026
Size: 101.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_contextguard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eb71616cda9241faa60814b88cca3ca56adaa792dc4dc5c2996dbb9aaadad9c`
MD5	`d28ab822e2acf0cb2614bd7424c6e6c0`
BLAKE2b-256	`e98451678d7a4a7adc095256b3944cacd6c0bb1b6aeabced6337aa965837dc71`

See more details on using hashes here.

llm-contextguard 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContextGuard: State-Contracted Verification for Agentic RAG

Why this exists

What’s included (v0.1)

Quick start (local)

Install (when published)

Programmatic use

Key concepts

Hero demo (recommended)

Quick eval harness (smoke)

Adapters & extensibility

Docs

How to integrate a retriever (metadata expectations)

What’s next (roadmap)

License

Contacts

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes