Skip to main content

Spectral diagnostics for trust in LLMs

Project description

Spectral Trust Framework

A Graph Signal Processing (GSP) framework for measuring the trustworthiness of LLM internal representations.

spectral_trust constructs dynamic graphs from attention patterns and applies spectral analysis (eigenvalues, Dirichlet energy) to detect hallucinations, quantify uncertainty, and map the "smoothness" of reasoning flows.

What is it?

By treating the transformer's attention mechanism as a graph and the hidden states as signals on that graph, we can calculate rigorous mathematical metrics:

  • Dirichlet Energy: How much the signal varies across connected tokens (proxy for conflict/uncertainty).
  • Smoothness Index: Normalized energy indicating how well the representation aligns with the attention structure.
  • Fiedler Value: Algebraic connectivity of the attention graph.
  • HFER (High-Frequency Energy Ratio): Energy concentration in high-frequency spectral components.
  • Plug-and-Play: Works out-of-the-box with Llama-3, Mistral, Qwen, Gemma, and Phi.
  • Directed Topology (v0.2.0): Support for Directed Laplacian spectral radius and imaginary components.
  • Spectral Velocity (v0.2.0): Cross-layer differential diagnostics to isolate the "Topological Shockwave."
  • Sparse Solver Optimization: High-performance $O(kN^2)$ solver for large-scale exhaustive sweeps.
  • Interactive Visualization: New --plots CLI flag and 2x2 diagnostic dashboards.
  • Offline Ready: --offline mode to use cached models without internet access.

Structure

  • src/spectral_trust/: Core package source code.
  • notebooks/: Tutorials and demos.
  • experiments/: Reproduction scripts for paper findings (Super Scar, etc.).
  • examples/: Minimal usage examples.

Installation

pip install spectral_trust
# OR install from source
pip install -e .

Usage

Automated Diagnosis (New!)

Run a full medical report on your model to detect known pathologies (like the "Super Scar"):

gsp-cli diagnose --model microsoft/phi-4 --verbose
  • scans for structural anomalies (graph disconnection).
  • probes with adversarial inputs (Active vs Passive).
  • reports signature matches (e.g., "Synthetic Scar Detected").

Single-Shot Analysis

Analyze a sentence (uses cuda if available):

gsp-cli analyze --text "The capital of France is Paris." --model llama-3.1-8b

Offline Mode (no internet required):

gsp-cli analyze --text "Refactoring is fun." --model llama-3.2-1b --offline

Python API

from spectral_trust import GSPDiagnosticsFramework, GSPConfig

config = GSPConfig(model_name="llama-3.2-1b", device="cuda", local_files_only=True)
with GSPDiagnosticsFramework(config) as framework:
    framework.instrumenter.load_model("meta-llama/Llama-3.2-1B")
    results = framework.analyze_text("The capital of France is Paris.")
    
    print(f"Smoothness: {results['layer_diagnostics'][-1].smoothness_index:.4f}")

Compare Two Texts

Compare the spectral properties of two different inputs side-by-side:

python -m spectral_trust.cli compare \
  --text1 "Total confidence: The capital of France is Paris." \
  --text2 "Low confidence: I think the capital might be Paris." \
  --model llama-3.2-1b

This will generate a comparison plot overlaying the metrics for both texts.

Multi-Run Analysis (Stochastic)

Run the analysis multiple times (useful with sampling enabled) to see metric stability:

python -m spectral_trust.cli analyze \
  --text "The capital of France is Paris." \
  --runs 5 \
  --temperature 0.7

Advanced GSP Options

For rigorous spectral graph analysis, you may want to exclude self-attention loops (the diagonal) to match standard spectral graph theory (where $A_{ii}=0$).

  • Default: Self-loops kept. Faithful to Transformer mechanics. Fiedler values $\approx 1.0$.
  • --remove_self_loops: Self-loops removed. Faithful to Graph Signal Processing theory. Fiedler values $\approx 2.0$ (for connected graphs). Better for measuring pure token-to-token mixing.
gsp-cli analyze --text "..." --remove_self_loops

Scientific Validation

This framework implements the methodologies described in [Noël, 2026].

Case Study: The Phi-4 "Super Scar"

We used spectral_trust to discover a critical vulnerability in the Phi-4 model:

  • Pathology: Complete structural attention collapse (Fiedler $\to$ 0.0) when processing "Heavy Agent" passive sentences.
  • Cause: Interaction between passive voice syntax and high-complexity noun phrases.
  • Reproduction:
    python experiments/reproduce_super_scar.py
    
    (Generates comparative plots for Phi vs. Qwen/Llama baselines)

It provides the reference implementation for measuring:

  • Fiedler Drop: The loss of algebraic connectivity in hallucinating models.
  • Energy Spikes: High-frequency noise indicating semantic conflict.

Model Compatibility & Benchmarks

Model Family Status Tested Version Precision
Llama-3 ✅ Passed meta-llama/Llama-3.2-1B FP16
Phi-3 ✅ Passed microsoft/Phi-3-mini-4k-instruct BF16
Inference Time ⚡ Fast ~45ms / 128 tokens Exact Eig

Research Tools included

  • examples/detect_hallucination.py: Differential spectral analysis of counter-factuals.
  • examples/ablation_study.py: Causal intervention via head masking to verify structural load-bearing.
  • benchmarks/: Latency and precision scaling scripts.

License

MIT

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). For commercial use, enterprise licensing, or closed-source integration (such as cloud deployment without open-sourcing your backend), please contact the author to arrange a commercial license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectral_trust-0.2.1.tar.gz (44.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spectral_trust-0.2.1-py3-none-any.whl (47.6 kB view details)

Uploaded Python 3

File details

Details for the file spectral_trust-0.2.1.tar.gz.

File metadata

  • Download URL: spectral_trust-0.2.1.tar.gz
  • Upload date:
  • Size: 44.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for spectral_trust-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e21b15eeae35b46db15e86c3067d0c4f139d654dbb3150012e17798735a65d45
MD5 114afac523a4ca620af1b87e9f0748db
BLAKE2b-256 3d6c12c1374fb735095a87996c5a4d6e24b9eccc569122825afac1b1b8210e73

See more details on using hashes here.

File details

Details for the file spectral_trust-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: spectral_trust-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 47.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for spectral_trust-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8cabecdae693fe590c6f820beb203edc3f79cbb74ae670d98465e94f5a5106f0
MD5 c45fc5f5a7c73a98e4d4131c334869a2
BLAKE2b-256 8f31e5b24d46d93752571872e3c09d962df8e07d44fe34418c1f64df13dbcd87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page