Skip to main content

Scientifically-grounded LLM degradation detection for developers

Project description

NerfProbe

Scientifically-grounded LLM degradation detection for developers.

PyPI License

Installation

pip install nerfprobe

Quick Start

CLI

# Run core probes on a model
nerfprobe run gpt-5.2 --tier core

# Run specific probes
nerfprobe run gpt-5.2 --probe math --probe style --probe code

# Use different provider
nerfprobe run claude-opus-4.5 --provider anthropic

# Custom endpoint (vLLM, Ollama, local)
nerfprobe run my-model --base-url http://localhost:8000/v1

# Output formats
nerfprobe run gpt-5.2 --format json > results.json
nerfprobe run gpt-5.2 --format markdown

Model Registry

# List known models (10 SOTA as of Dec 2025)
nerfprobe list-models

# Research unknown model
nerfprobe research qwen3:8b --provider alibaba
# -> Outputs prompt to paste into any LLM

# Parse research response
nerfprobe research qwen3:8b --provider alibaba --parse '{"context_window": 32768}'

Python API

import asyncio
from nerfprobe import run_probes, OpenAIGateway

async def main():
    gateway = OpenAIGateway(api_key="...")
    
    # Run core tier
    results = await run_probes("gpt-5.2", gateway, tier="core")
    
    for r in results:
        print(r.summary())
        # math_probe: PASS (1.00) in 234ms
        # style_probe: PASS (0.87) in 189ms
        # timing_probe: PASS (1.00) in 156ms
        # code_probe: PASS (1.00) in 312ms
    
    await gateway.close()

asyncio.run(main())

Probes

Tier Probes Description
core math, style, timing, code Essential degradation signals
advanced fingerprint, context, routing, repetition, constraint, logic, cot Research-backed detection
optional calibration, zeroprint, multilingual Requires logprobs or multi-call
all All 14 probes Comprehensive testing

Gateways

Gateway Providers
OpenAIGateway OpenAI, OpenRouter, vLLM, Ollama, Together, Fireworks
AnthropicGateway Claude models
GoogleGateway Gemini models
BedrockGateway AWS Bedrock (Claude, Titan)
DashScopeGateway Alibaba Qwen models
ZhipuGateway GLM models
OllamaGateway Local Ollama models

Environment Variables

# API keys (or use --api-key flag)
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GOOGLE_API_KEY="..."
export OPENROUTER_API_KEY="..."

Research Basis

All probes are grounded in peer-reviewed research:

Dependencies

  • nerfprobe-core - Probe implementations
  • httpx - HTTP client
  • typer - CLI framework
  • rich - Terminal output

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nerfprobe-0.1.0.tar.gz (49.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nerfprobe-0.1.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file nerfprobe-0.1.0.tar.gz.

File metadata

  • Download URL: nerfprobe-0.1.0.tar.gz
  • Upload date:
  • Size: 49.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8633ade4e5134b917f2ac70582ffa0a9caa124aed24caf9f4b6115f79c623082
MD5 5a8f68e7459b5f47b9a53c56d1913dfb
BLAKE2b-256 1b8787ea19e6aaeb1d064fbdc6f8c48578533758546ede91148a89e8d52e210b

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe-0.1.0.tar.gz:

Publisher: release.yml on nerfstatus/nerfprobe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nerfprobe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nerfprobe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73db6255afbb9afbac4afbe5f86dcb1feaa57ed109e9d4b0f5a9d21ed8fa657f
MD5 68f5e3c0b5a294d940af75fab43304c9
BLAKE2b-256 79a0a6c99ad3df3322ea05ad1725c9eda73d6db3e9ae224fb727e5cb8b6c10dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe-0.1.0-py3-none-any.whl:

Publisher: release.yml on nerfstatus/nerfprobe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page