Compress LLMs while auditing whether they still know truth vs myths. SVD compression + false-belief detection in one toolkit.
Project description
Knowledge Fidelity
Compress an LLM while auditing whether it still knows truth vs popular myths.
The first toolkit that uses the same factual probes for both structural importance scoring (SVD compression) and behavioral false-belief detection (confidence cartography). One call to compress and audit:
from knowledge_fidelity import compress_and_audit
report = compress_and_audit("Qwen/Qwen2.5-7B-Instruct", ratio=0.7)
print(f"Retention: {report['retention']:.0%} | "
f"False-belief signal: rho={report['rho_after']:.3f}")
# Retention: 100% | False-belief signal: rho=0.725
Why This Exists
LLM compression is everywhere. Knowledge auditing is rare. Nobody checks both at once.
When you quantize or prune a model, you run HellaSwag and call it a day. But benchmarks don't tell you whether the model now thinks the Berenstain Bears are spelled "Berenstein" or that vaccines cause autism. Knowledge Fidelity does.
Two sensors, one toolkit:
| Sensor | What it measures | How |
|---|---|---|
| Structural (SVD) | Which weights encode facts | Gradient importance on factual probes |
| Behavioral (Confidence) | Whether the model believes truth vs myths | Teacher-forced probability on true/false pairs |
The key insight: the same set of factual probes drives both. Compress with awareness of what matters, then verify nothing broke.
Early Results (v0.1)
All results below are from the unified toolkit run on Apple Silicon (M3 Ultra, CPU).
Multi-Seed CF90 Validation (70% rank, 3 seeds)
| Metric | Qwen2.5-0.5B | Qwen2.5-7B-Instruct |
|---|---|---|
| Retention | 95% ± 0% | 100% ± 0% |
| rho before | 0.821 | 0.746 |
| rho after | 0.720 | 0.725 |
| rho drop | 0.101 ± 0.000 | 0.021 ± 0.000 |
| Matrices compressed | 72 | 84 |
| Layers frozen | 18/24 | 21/28 |
The 7B model loses only 0.021 rho under CF90 — nearly perfect fidelity at scale.
Joint Ablation: Compression Ratio vs Confidence (Qwen2.5-0.5B)
| Ratio | Default rho | Mandela rho | Medical rho |
|---|---|---|---|
| 50% | 0.821 → 0.761 | 0.257 → 0.714 | 0.100 → 0.700 |
| 60% | 0.821 → 0.714 | 0.257 → 0.771 | 0.100 → 0.900 |
| 70% | 0.821 → 0.720 | 0.257 → 0.771 | 0.100 → 0.100 |
| 80% | 0.821 → 0.690 | 0.257 → 0.257 | 0.100 → 0.600 |
| 90% | 0.821 → 0.821 | 0.257 → 0.371 | 0.100 → 0.100 |
| 100% | 0.821 → 0.821 | 0.257 → 0.257 | 0.100 → 0.100 |
Joint Ablation: Compression Ratio vs Confidence (Qwen2.5-7B-Instruct)
| Ratio | Default rho | Mandela rho | Medical rho |
|---|---|---|---|
| 50% | 0.746 → 0.689 | 0.829 → 0.771 | −0.700 → 0.600 |
| 70% | 0.746 → 0.725 | 0.829 → 0.943 | −0.700 → −0.600 |
| 90% | 0.746 → 0.713 | 0.829 → 0.943 | −0.700 → −0.900 |
| 100% | 0.746 → 0.746 | 0.829 → 0.829 | −0.700 → −0.700 |
SVD as a Denoiser
A surprising finding at 7B scale: SVD compression can improve the Mandela effect signal. At 70% and 90% rank, Mandela rho increases from 0.829 to 0.943 — the compressed model discriminates true from false memories better than the original.
This is consistent with the interpretation that truncated SVD strips noise from attention projections while preserving the principal signal directions that encode factual knowledge. On small probe sets (6 Mandela, 5 medical), removing noise can sharpen the true/false separation. The effect is weaker at 0.5B where the baseline Mandela signal is already noisy (rho=0.257).
This has practical implications: moderate CF90 compression may serve as a denoising regularizer for factual knowledge, not just a lossy compression step.
Scale-Dependent Findings
| Finding | 0.5B | 7B |
|---|---|---|
| Mandela baseline rho | 0.257 (weak) | 0.829 (strong) |
| CF90 rho drop | 0.101 (moderate) | 0.021 (minimal) |
| CF90 retention | 95% | 100% |
| SVD denoising on Mandela | Mixed | +0.114 rho |
The Mandela effect signal strengthens dramatically with scale (3.2× from 0.5B to 7B), and CF90 compression becomes safer at larger scales.
Prior Results (from Component Projects)
These findings come from the standalone intelligent-svd and confidence-cartography projects that this toolkit unifies:
| Finding | Result |
|---|---|
| Confidence correlates with human false-belief prevalence | rho=0.652, p=0.016 (Pythia 160M–12B) |
| Out-of-domain medical claims | 88% accuracy at 6.9B |
| Targeted resampling at low-confidence tokens | Outperforms uniform best-of-N |
| CF90 + INT8 stacking | 72–77% retention (Qwen-0.5B, Llama-7B) |
| Importance-guided SVD at 50% rank | 3× better retention than standard SVD |
Compression Safety Guide
| Layer Type | Safe to Compress | Notes |
|---|---|---|
| Q, K, O projections | Yes at 70% rank | Main target |
| V projection | 90–95% only | Marginal gains, high risk below 90% |
| MLP layers | Never | Destroys model at any compression level |
Install
pip install knowledge-fidelity # Core (SVD + probes)
pip install "knowledge-fidelity[cartography]" # + confidence analysis + plots
pip install "knowledge-fidelity[full]" # Everything including MLX
Or from source:
git clone https://github.com/SolomonB14D3/knowledge-fidelity
cd knowledge-fidelity
pip install -e ".[full]"
Quick Start
One-Call Compress + Audit
from knowledge_fidelity import compress_and_audit
report = compress_and_audit(
"Qwen/Qwen2.5-7B-Instruct",
ratio=0.7, # Keep 70% of singular values
freeze_ratio=0.75, # Freeze bottom 75% of layers
)
print(report["summary"])
# Compressed Qwen/Qwen2.5-7B-Instruct at 70% rank | 84 matrices | 21/28 frozen | Retention: 100% | rho: 0.746 -> 0.725
Step-by-Step (More Control)
from transformers import AutoModelForCausalLM, AutoTokenizer
from knowledge_fidelity.svd import compress_qko, freeze_layers
from knowledge_fidelity import audit_model
# Load
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype=torch.float32)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Compress
compress_qko(model, ratio=0.7) # SVD on Q, K, O projections
freeze_layers(model, ratio=0.75) # Freeze bottom 75%
# Audit
audit = audit_model(model, tokenizer)
print(f"rho={audit['rho']:.3f}, {audit['n_positive_delta']}/{audit['n_probes']} probes positive")
# Fine-tune gently: 1 epoch, lr=1e-5
Importance-Guided Compression (for Aggressive Ratios)
When compressing below 70%, standard SVD loses facts. The importance-guided variant uses gradient information to decide which singular values to keep:
from knowledge_fidelity.svd import compress_qko_importance, compute_importance
importance = compute_importance(model, tokenizer) # Uses shared probes
compress_qko_importance(model, importance, ratio=0.5) # 3x better at 50%
Confidence Analysis Only
from knowledge_fidelity.cartography import analyze_confidence
# Teacher-forced: how confident is the model on each token?
record = analyze_confidence(
"The capital of France is Paris.",
model_name="EleutherAI/pythia-1.4b",
)
print(f"Mean confidence: {record.mean_top1_prob:.3f}")
print(f"Min confidence at: '{record.min_confidence_token}' "
f"(prob={record.min_confidence_value:.3f})")
Custom Probes
from knowledge_fidelity import compress_and_audit, load_probes
# Use domain-specific probes
medical_probes = load_probes("data/probes/medical_claims.json")
report = compress_and_audit("my-model", probes=medical_probes)
# Or inline
custom = [
{"text": "TCP uses a three-way handshake.",
"false": "TCP uses a two-way handshake.",
"domain": "networking", "id": "tcp_handshake"},
]
report = compress_and_audit("my-model", probes=custom)
Built-In Probe Sets
| Set | Count | Purpose |
|---|---|---|
get_default_probes() |
20 | Geography, science, history, biology |
get_mandela_probes() |
6 | Popular false memories (Berenstain Bears, Vader quote, etc.) |
get_medical_probes() |
5 | Common medical misconceptions |
get_all_probes() |
31 | All of the above |
Community contributions welcome — add probes for your domain and submit a PR.
How It Works
The CF90 Pipeline (Structural Sensor)
- Compress Q, K, O attention projections at 70% rank via truncated SVD
- Freeze 75% of layers from the bottom up
- Fine-tune gently (1 epoch, lr=1e-5)
SVD removes noise from attention weight matrices while preserving signal directions important for factual knowledge. Freezing prevents catastrophic forgetting.
Confidence Cartography (Behavioral Sensor)
For each token in a text, measure the probability the model assigns to it (teacher-forced). True statements get higher confidence than false ones. The ratio between true/false confidence is a behavioral signal for whether the model "believes" a fact.
The Unification
Both use the same probes:
- SVD importance scoring runs forward+backward on probe texts to compute gradient magnitudes — which weights matter for encoding these facts
- Confidence auditing runs a forward pass on true vs false versions of the same probes — does the model assign higher probability to truth?
Compress with knowledge of what matters. Verify nothing was lost. Same probes, both sides.
Experiments
# Quick demo (~5 min on Qwen-0.5B, ~8 min on 7B)
python examples/quick_demo.py
python examples/quick_demo.py --model Qwen/Qwen2.5-7B-Instruct
# Joint ablation: compression ratio vs confidence preservation
python experiments/joint_ablation.py --model Qwen/Qwen2.5-7B-Instruct
# Multi-seed CF90 validation
python experiments/run_cf90_multiseed.py --model Qwen/Qwen2.5-7B-Instruct --seeds 3
Deployment
# Export to GGUF for llama.cpp / Ollama
python deployment/export_gguf.py --input compressed_model/ --output model.gguf --quantize q4_k_m
# Benchmark with vLLM
python deployment/vllm_benchmark.py --baseline Qwen/Qwen2.5-7B-Instruct --compressed ./compressed_model
See deployment/mlx_recipe.md for Apple Silicon inference with MLX.
Platform Notes (Apple Silicon)
- Use CPU for compression and fine-tuning (MPS has matmul errors with some architectures and NaN gradients with frozen layers)
- Use MLX for fast inference after compression
- Set
HF_HOMEto external storage for large models
Model Compatibility
Works on any HuggingFace causal LM with model.model.layers[i].self_attn.{q,k,o}_proj (standard for Qwen, Llama, Mistral) or model.transformer.h (GPT-2 style).
Validated on:
- Qwen2.5: 0.5B, 1.5B, 7B, 32B
- Llama 2: 7B
- Should work on Mistral, Phi, Gemma (same layer layout) — PRs with test results welcome
Built On
This toolkit unifies two standalone research projects:
- Intelligent SVD — CF90 compression method and safety rules
- Confidence Cartography — False-belief detection via teacher-forced confidence
Both remain available as independent repos. Knowledge Fidelity combines their core ideas into a single pipeline with a shared probe system.
Citation
@software{knowledge_fidelity,
author = {Bryan Sanchez},
title = {Knowledge Fidelity: Compress LLMs While Auditing What They Still Know},
year = {2026},
url = {https://github.com/SolomonB14D3/knowledge-fidelity}
}
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowledge_fidelity-0.1.1.tar.gz.
File metadata
- Download URL: knowledge_fidelity-0.1.1.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3b0753f8fe30f7c6b7d5b167e20f5ee2660f3c9c1952576bdb6278d5fe5708d
|
|
| MD5 |
71ade1e0d180f04c1128632be3ea5e1d
|
|
| BLAKE2b-256 |
14cfeee09baf2caeac859ae16aae5b4be8afb9f97bea529c23cdf750b800ed2f
|
File details
Details for the file knowledge_fidelity-0.1.1-py3-none-any.whl.
File metadata
- Download URL: knowledge_fidelity-0.1.1-py3-none-any.whl
- Upload date:
- Size: 26.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab3e024a964a3f60049d3029c2191bcd32ac4e41000063bee9de5bbad7b96473
|
|
| MD5 |
e821cbe4401f791d78842d7658b9ab3c
|
|
| BLAKE2b-256 |
70a95a1540cfed1219075770b7ecb488737a4c9e22ff7dfb0cfb39e193d28c12
|