Scientifically-grounded LLM degradation detection for developers
Project description
NerfProbe
Scientifically-grounded LLM degradation detection for developers.
Installation
pip install nerfprobe
Quick Start
CLI
# Run core probes on a model
nerfprobe run gpt-5.2 --tier core
# Run specific probes
nerfprobe run gpt-5.2 --probe math --probe style --probe code
# Use different provider
nerfprobe run claude-opus-4.5 --provider anthropic
# Custom endpoint (vLLM, Ollama, local)
nerfprobe run my-model --base-url http://localhost:8000/v1
# Output formats
nerfprobe run gpt-5.2 --format json > results.json
nerfprobe run gpt-5.2 --format markdown
Model Registry
# List known models (10 SOTA as of Dec 2025)
nerfprobe list-models
# Research unknown model
nerfprobe research qwen3:8b --provider alibaba
# -> Outputs prompt to paste into any LLM
# Parse research response
nerfprobe research qwen3:8b --provider alibaba --parse '{"context_window": 32768}'
Python API
import asyncio
from nerfprobe import run_probes, OpenAIGateway
async def main():
gateway = OpenAIGateway(api_key="...")
# Run core tier
results = await run_probes("gpt-5.2", gateway, tier="core")
for r in results:
print(r.summary())
# math_probe: PASS (1.00) in 234ms
# style_probe: PASS (0.87) in 189ms
# timing_probe: PASS (1.00) in 156ms
# code_probe: PASS (1.00) in 312ms
await gateway.close()
asyncio.run(main())
Probes
| Tier | Probes | Description |
|---|---|---|
| core | math, style, timing, code | Essential degradation signals |
| advanced | fingerprint, context, routing, repetition, constraint, logic, cot | Research-backed detection |
| optional | calibration, zeroprint, multilingual | Requires logprobs or multi-call |
| all | All 14 probes | Comprehensive testing |
Gateways
| Gateway | Providers |
|---|---|
OpenAIGateway |
OpenAI, OpenRouter, vLLM, Ollama, Together, Fireworks |
AnthropicGateway |
Claude models |
GoogleGateway |
Gemini models |
BedrockGateway |
AWS Bedrock (Claude, Titan) |
DashScopeGateway |
Alibaba Qwen models |
ZhipuGateway |
GLM models |
OllamaGateway |
Local Ollama models |
Environment Variables
# API keys (or use --api-key flag)
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GOOGLE_API_KEY="..."
export OPENROUTER_API_KEY="..."
Research Basis
All probes are grounded in peer-reviewed research:
- MathProbe: 2504.04823 - Quantization Hurts Reasoning
- StyleProbe: 2403.06408 - Perturbation Lens
- TimingProbe: 2502.20589 - LLMs Have Rhythm
- CodeProbe: 2512.08213 - Package Hallucinations
- FingerprintProbe: 2407.15847 - LLMmap
- ContextProbe: 2512.12008 - KV Cache Compression
- RoutingProbe: 2406.18665 - RouteLLM
Dependencies
nerfprobe-core- Probe implementationshttpx- HTTP clienttyper- CLI frameworkrich- Terminal output
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nerfprobe-0.2.0.tar.gz.
File metadata
- Download URL: nerfprobe-0.2.0.tar.gz
- Upload date:
- Size: 49.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34b0d38d5d100ed887913dfc2428fc955d129d446aae208ef44394dc7df52e85
|
|
| MD5 |
3f4f4ec81b5224e7a2beb36e68147d74
|
|
| BLAKE2b-256 |
461dc8849de5dabd3b9138b75eb2a5bb76e93ab33ea8f8814cd282ca1276cdf9
|
Provenance
The following attestation bundles were made for nerfprobe-0.2.0.tar.gz:
Publisher:
release.yml on nerfstatus/nerfprobe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nerfprobe-0.2.0.tar.gz -
Subject digest:
34b0d38d5d100ed887913dfc2428fc955d129d446aae208ef44394dc7df52e85 - Sigstore transparency entry: 776122499
- Sigstore integration time:
-
Permalink:
nerfstatus/nerfprobe@0178b4214ecb245b3319f8a88888bc940c240c92 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/nerfstatus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0178b4214ecb245b3319f8a88888bc940c240c92 -
Trigger Event:
push
-
Statement type:
File details
Details for the file nerfprobe-0.2.0-py3-none-any.whl.
File metadata
- Download URL: nerfprobe-0.2.0-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f86f449f95ab72ecbdaad9f6b3735452a4ab411b883e87417d7ed45a4512585
|
|
| MD5 |
1bf20421d446e0afcccc22fe766d90d1
|
|
| BLAKE2b-256 |
80e248ec6cd5c00c2be19ce2ff5a1b69d665857b1ad888f77c4e2c492cbb41c3
|
Provenance
The following attestation bundles were made for nerfprobe-0.2.0-py3-none-any.whl:
Publisher:
release.yml on nerfstatus/nerfprobe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nerfprobe-0.2.0-py3-none-any.whl -
Subject digest:
7f86f449f95ab72ecbdaad9f6b3735452a4ab411b883e87417d7ed45a4512585 - Sigstore transparency entry: 776122501
- Sigstore integration time:
-
Permalink:
nerfstatus/nerfprobe@0178b4214ecb245b3319f8a88888bc940c240c92 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/nerfstatus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0178b4214ecb245b3319f8a88888bc940c240c92 -
Trigger Event:
push
-
Statement type: