Credence — epistemic enforcement layer that prevents LLMs from forgetting what they didn't know

These details have not been verified by PyPI

Project links

Project description

Credence

Claude doesn't just forget what you told it. It forgets whether you were sure about it.

You're in a Claude Code session. You say:

"The rate limit is probably around 50 — I haven't confirmed it yet."

Fifteen turns of coding later, Claude writes:

RATE_LIMIT = 50

No warning. No flag. The uncertainty is gone. You ship it.
The API rejects every request at 2am. The real limit was 10.
Nobody lied. Claude just forgot you weren't sure.

This failure has a name. We measured it. We fixed it.

Credence gate blocking in real time

The Problem, Measured

Epistemic Qualifier Loss (EQL) — uncertainty markers ("I think", "unverified", "roughly", "the vendor claims") are stripped during context compression, causing downstream models to treat uncertain constraints as confirmed facts.

We ran 50 compression scenarios with three conditions:

Condition	Qualifier Strip Rate	False Certainty Rate
Naive Haiku compression	46%	6%
LLMLingua-style scoring	68%	74%
Credence (faithfulness probe)	0%	0%

The False Certainty Rate (FCR) — model states an uncertain value as confirmed fact — drops to zero. Deterministically. With zero extra API calls across all five enforcement layers.

Every engineering team using Claude Code today is producing ghost constraints they don't know about. Every sprint.

Quick Start

pip install credence-guard
python quickstart.py          # all 5 enforcement layers, no API key needed

Or from source:

git clone https://github.com/Lakshmi-Chakradhar-Vijayarao/credence-ai
cd credence-ai && pip install -e ".[mcp]"
python quickstart.py

What Gets Blocked

Without Credence — Haiku strips "I think" from the context summary. Downstream model writes:

RATE_LIMIT = 50
ALGORITHM  = "RS256"
TOKEN_EXPIRY = 3600

With Credence — Generation-Time Scanner annotates inline before code reaches you:

RATE_LIMIT   = 50    # ⚠⚠ CREDENCE[HIGH RISK, conf=0.15]: rate limit unconfirmed — vendor docs pending
ALGORITHM    = "RS256"  # ⚠  CREDENCE[unverified, conf=0.28]: encryption algo — per vendor call
TOKEN_EXPIRY = 3600  # ⚠  CREDENCE[unverified, conf=0.31]: auth token expiry not verified

Rust gate (3.4ms) blocks the write entirely when unverified constraints overlap the tool action:

╔══════════════════════════════════════════════════════════════╗
║  CREDENCE GATE — TOOL BLOCKED                                ║
╚══════════════════════════════════════════════════════════════╝

  Tool:   Edit
  ⚠ [LOW, conf=0.28] auth token expires in 3600s — unconfirmed
    Overlap terms: token, expires, auth

  Use credence_verify(<id>, <confirmed_value>) to resolve.

Once verified, the constraint clears. The gate unblocks. Claude writes the code.

uncertain → registered → enforced → verified → released

How It Works

Five checkpoints. Four are fully deterministic — no model cooperation required.

User states uncertain claim
        │
        ▼
┌─────────────────────────────────────────────┐
│  REGISTRY  (SQLite, ~0.37ms)                │
│  Stores uncertain constraints with          │
│  per-type confidence decay                  │
│  Cross-session. Zero API calls.             │
└──────────────────────┬──────────────────────┘
                       │
    ▼ before compression
┌──────────────────────────────────────────────┐
│  CP1 — Faithfulness Probe  (P99=0.026ms)     │  DETERMINISTIC
│  198-term frozenset. Scans user turns only.  │
│  Uncertainty found → block Haiku → KEEP      │
└──────────────────────────────────────────────┘
                       │
    ▼ before generation
┌──────────────────────────────────────────────┐
│  CP2 — Truth Buffer + Consistency Enforcer   │  PROBABILISTIC
│  Injects all unverified constraints into     │  (model must comply)
│  every system prompt. When query overlaps    │
│  registered constraint → imperative block:   │
│  "YOU MUST express uncertainty."             │
└──────────────────────────────────────────────┘
                       │
    ▼ after generation
┌──────────────────────────────────────────────┐
│  CP3 — Generation-Time Scanner (P99=0.036ms) │  DETERMINISTIC
│  Catches numeric + string literals in code   │
│  and prose. Three tiers: ⚠⚠ HIGH RISK /     │
│  ⚠ UNVERIFIED / CHECK based on conf score.  │
└──────────────────────────────────────────────┘
                       │
    ▼ at tool execution
┌──────────────────────────────────────────────┐
│  CP4 — Rust Gate (P50=3.4ms)                 │  DETERMINISTIC
│  Native PreToolUse hook. Blocks Write/Edit/  │
│  Bash when arguments overlap an unverified   │
│  constraint. 98× faster than Python hook.   │
└──────────────────────────────────────────────┘
                       │
    ▼ across sessions
┌──────────────────────────────────────────────┐
│  CP5 — Cross-Session Memory (P99=1.03ms)     │  DETERMINISTIC
│  New sessions inherit uncertainty status,    │
│  not just values. Epistemic debt survives    │
│  restarts.                                   │
└──────────────────────────────────────────────┘

Total in-session overhead (P99): 1.1ms in-process + 3.4ms gate = ~4.5ms. Zero extra API calls.
LLM call overhead: ~0.09% of typical Claude Opus latency (3,000–8,000ms).

Ghost Constraints

The faithfulness probe catches explicit hedges: "I think", "approximately", "probably" — 198 markers.

But what about:

"The Stripe rate limit is 50 req/min."

No hedging. Stated as fact. Actually from a sales call, never confirmed. The probe sees nothing.

This is a ghost constraint — implicitly uncertain, no surface markers. The Ghost Detector (opt-in, one Opus call per constraint at registration) classifies whether a stated fact is an established truth or a vendor claim stated as fact.

Ghost Gauntlet — n=10 sessions, all Opus 4.7

Credence (Ghost Detector active)  BothRate = 1.000
Naive sliding window              BothRate = 0.200

Install in Claude Code

pip install "credence-ai[mcp]"

Add to .claude/settings.json:

{
  "mcpServers": {
    "credence": {
      "command": "python3",
      "args": ["-m", "credence.mcp_server"],
      "env": { "ANTHROPIC_API_KEY": "your-key-here" }
    }
  },
  "hooks": {
    "PreToolUse": [{
      "matcher": "Write|Edit|Bash|NotebookEdit",
      "hooks": [{ "type": "command", "command": "credence-gate" }]
    }]
  }
}

Build the Rust gate (optional, for CP4 enforcement):

cd credence_gate && cargo build --release
# binary: credence_gate/target/release/credence-gate

Validated Results (Latest)

Experiment	Credence	Naive / Baseline
Compression faithfulness — Haiku FCR (n=50)	0%	6%
Compression faithfulness — LLMLingua FCR (n=50)	0%	74%
E6: Long-session constraint recall (n=23 trials)	100%	19.6% (naive window)
E7: Multi-hop 3-step reasoning chain	3/3 hops	0/3 (naive)
E8: Real debugging session recall	1.000	0.522 (naive)
Ghost Gauntlet BothRate (n=10 sessions)	1.000	0.200 (naive)
Cross-session FCR (n=20 callbacks)	0%	40% (no memory)
Rust gate latency	3.4ms	331ms (Python)
Total in-process overhead (P99 sum)	1.1ms	—

Reproducing the Results

No API key — runs in seconds:

python quickstart.py                     # live demo, no API needed
python tests/tests.py                    # 178 unit tests
python tests/test_claims.py              # validates all claimed numbers offline
python -m evals.adversarial_tests        # 5 adversarial robustness tests
python -m evals.latency_report --n 1000  # P50/P95/P99 for all 5 checkpoints
python -m evals.calibration_curve        # ECE + ghost candidate analysis

Dataset & Training:

# Build the 5,000-triple DPO training dataset (requires ANTHROPIC_API_KEY, ~$5)
python -m data.build_training_dataset --n 5000 --out data/epistemic_compression_training.json

# Verify dataset quality
python -m data.build_training_dataset --verify --sample 500

# Training (requires GPU — Kaggle T4 free tier)
# See training/dpo_finetune.py and kaggle_kv_cache/run_kv_experiment.py

Training Status

Phase 3 DPO fine-tuning completed on Kaggle T4 (Phi-2, 5,000 triples). Best checkpoint: epoch 2. Full epoch curve and three-point comparison:

Condition	FCR	EQLR	Notes
Base Phi-2 (pre-DPO)	31.2%	53.3%	Generation-level baseline
DPO fine-tuned (epoch 2)	19.1%	62.1%	−12.1pp, 39% relative reduction
DPO epoch 3	22.1%	58.8%	Regressed — overfit, use epoch 2
Probe (deterministic)	0%	100%	Mechanical guarantee, zero API calls

Epoch 3 regression is expected DPO behavior (lambda=0.3, model drifted too far from reference). Use epoch_2/ adapter. Pull results: kaggle kernels output chakradharvijayarao/credence-phase-3-dpo-epistemic-fine-tuning -p /tmp/dpo_out/

Adapter (epoch 2 best): evals/dpo_epoch_results.json (after pull)
Earlier negative result (archived): models/credence-phi-2-dpo

With API key — core evidence (~$7 total):

python -m evals.compression_faithfulness --n 50   # headline: 46%→0% EQLR, 74%→0% FCR  (~$3)
python -m evals.ghost_gauntlet                     # BothRate 0.200→1.000                (~$2)
python -m evals.experiments --exp E6               # long-session recall 100% vs 19.6%  (~$0.50)
python -m evals.experiments --exp E7               # 3-hop chain: 3/3 vs 0/3            (~$0.20)
python -m evals.experiments --exp E8               # debugging session recall            (~$0.30)

All results already saved in evals/*.json — no API key needed to read them.

As a Python Library

from credence import ContextManager, CredenceRegistry

registry = CredenceRegistry()
cm = ContextManager(registry=registry, session_id="my-session")

# Uncertain constraint gets registered and enforced automatically
result = cm.chat("The rate limit is probably 50 req/min — I haven't confirmed it")

# Next session inherits the uncertainty
from credence import CredenceMemory
memory = CredenceMemory(registry)
memory.snapshot("my-session", project="my-api-project")
# New session will see: "UNVERIFIED: rate limit is probably 50 req/min"

As an MCP Server (10 tools)

# credence_chat        — full enforcement turn
# credence_register    — register an uncertain constraint
# credence_verify      — mark a constraint as confirmed
# credence_gate        — pre-execution agentic gate
# credence_inspect     — BLOCK/VERIFY/PRESERVE/PROCEED recommendation
# credence_scan        — scan any model output for unverified literals
# credence_trajectory  — certainty trajectory for a constraint over time
# credence_memory_snapshot / credence_memory_recall — cross-session
# ... 14 more

Project Structure

credence/
  context_manager.py    All 5 enforcement layers
  registry.py           SQLite constraint store + confidence decay
  confidence_proxy.py   J-score (zero API, zero latency)
  memory.py             Cross-session epistemic persistence
  mcp_server.py         FastMCP server — 10 tools
  pipeline_monitor.py   Multi-agent handoff interception

evals/                  12 validation studies
  compression_faithfulness.py   Primary result (n=50)
  ghost_gauntlet.py             Ghost constraint benchmark
  gauntlet.py                   50-scenario breadth benchmark
  experiments.py                E1–E9 ablation experiments
  eql_bench.py                  EQL-Bench v1 dataset (52 scenarios, 8 domains)
  latency_report.py             P50/P95/P99 for all 5 checkpoints
  calibration_curve.py          ECE + ghost candidate analysis

credence_gate/          Rust PreToolUse hook — 3.4ms
tests/                  178 unit tests
docs/                   Technical report, architecture, vision
quickstart.py           First-run demo (no API key)

Documentation

What you want	Where
Full methodology + related work	docs/TECHNICAL_REPORT.md
Layer-by-layer design decisions	docs/ARCHITECTURE.md
Honest scope and limitations	docs/SUBMISSION.md
Research vision	docs/VISION.md

Contributing

See CONTRIBUTING.md. All offline tests pass without an API key — you can contribute to the probe, registry, GTS, and Rust gate without spending anything.

Built By

Lakshmi Chakradhar Vijayarao — Independent Researcher

LinkedIn · X / Twitter · lakshmichakradhar.v@gmail.com

MIT License — see LICENSE

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

May 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

credence_guard-1.0.0.tar.gz (249.5 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

credence_guard-1.0.0-py3-none-any.whl (232.5 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file credence_guard-1.0.0.tar.gz.

File metadata

Download URL: credence_guard-1.0.0.tar.gz
Upload date: May 2, 2026
Size: 249.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for credence_guard-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`288102020d0a41c627f7c3a2490eea85db6e0cbc77efe44997c3f683277a5ce5`
MD5	`0d3ee0c74c064556578cdf0191510c13`
BLAKE2b-256	`520e6a051920938c8f77c718566ad5501816b076028ee30af66e1638a29301a1`

See more details on using hashes here.

File details

Details for the file credence_guard-1.0.0-py3-none-any.whl.

File metadata

Download URL: credence_guard-1.0.0-py3-none-any.whl
Upload date: May 2, 2026
Size: 232.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for credence_guard-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`108c36269be44e29f0b6ff934459e9902b1728b594817a142951cebec4db13ef`
MD5	`403b5f5d8deb3c413f7fcb67f0e904d5`
BLAKE2b-256	`9b325b52fe73016f3057065690bdaf1de0426d87b693b1043fba10c7ba78d69e`

See more details on using hashes here.

credence-guard 1.0.0

Navigation

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Credence

The Problem, Measured

Quick Start

What Gets Blocked

How It Works

Ghost Constraints

Install in Claude Code

Validated Results (Latest)

Reproducing the Results

Training Status

As a Python Library

As an MCP Server (10 tools)

Project Structure

Documentation

Contributing

Built By

Project details

Verified details

Project links

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes