Skip to main content

Scientifically-grounded LLM degradation detection for developers

Project description

NerfProbe

Scientifically-grounded LLM degradation detection for developers.

PyPI License

Installation

pip install nerfprobe

Quick Start

CLI

# Run core probes on a model
nerfprobe run gpt-5.2 --tier core

# Run specific probes
nerfprobe run gpt-5.2 --probe math --probe style --probe code

# Use different provider
nerfprobe run claude-opus-4.5 --provider anthropic

# Custom endpoint (vLLM, Ollama, local)
nerfprobe run my-model --base-url http://localhost:8000/v1

# Output formats
nerfprobe run gpt-5.2 --format json > results.json
nerfprobe run gpt-5.2 --format markdown

Model Registry

# List known models (10 SOTA as of Dec 2025)
nerfprobe list-models

# Research unknown model
nerfprobe research qwen3:8b --provider alibaba
# -> Outputs prompt to paste into any LLM

# Parse research response
nerfprobe research qwen3:8b --provider alibaba --parse '{"context_window": 32768}'

Python API

import asyncio
from nerfprobe import run_probes, OpenAIGateway

async def main():
    gateway = OpenAIGateway(api_key="...")
    
    # Run core tier
    results = await run_probes("gpt-5.2", gateway, tier="core")
    
    for r in results:
        print(r.summary())
        # math_probe: PASS (1.00) in 234ms
        # style_probe: PASS (0.87) in 189ms
        # timing_probe: PASS (1.00) in 156ms
        # code_probe: PASS (1.00) in 312ms
    
    await gateway.close()

asyncio.run(main())

Probes

Tier Probes Description
core math, style, timing, code Essential degradation signals
advanced fingerprint, context, routing, repetition, constraint, logic, cot Research-backed detection
optional calibration, zeroprint, multilingual Requires logprobs or multi-call
all All 14 probes Comprehensive testing

Gateways

Gateway Providers
OpenAIGateway OpenAI, OpenRouter, vLLM, Ollama, Together, Fireworks
AnthropicGateway Claude models
GoogleGateway Gemini models
BedrockGateway AWS Bedrock (Claude, Titan)
DashScopeGateway Alibaba Qwen models
ZhipuGateway GLM models
OllamaGateway Local Ollama models

Environment Variables

# API keys (or use --api-key flag)
export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GOOGLE_API_KEY="..."
export OPENROUTER_API_KEY="..."

Research Basis

All probes are grounded in peer-reviewed research:

Dependencies

  • nerfprobe-core - Probe implementations
  • httpx - HTTP client
  • typer - CLI framework
  • rich - Terminal output

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nerfprobe-0.2.0.tar.gz (49.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nerfprobe-0.2.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file nerfprobe-0.2.0.tar.gz.

File metadata

  • Download URL: nerfprobe-0.2.0.tar.gz
  • Upload date:
  • Size: 49.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe-0.2.0.tar.gz
Algorithm Hash digest
SHA256 34b0d38d5d100ed887913dfc2428fc955d129d446aae208ef44394dc7df52e85
MD5 3f4f4ec81b5224e7a2beb36e68147d74
BLAKE2b-256 461dc8849de5dabd3b9138b75eb2a5bb76e93ab33ea8f8814cd282ca1276cdf9

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe-0.2.0.tar.gz:

Publisher: release.yml on nerfstatus/nerfprobe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nerfprobe-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nerfprobe-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7f86f449f95ab72ecbdaad9f6b3735452a4ab411b883e87417d7ed45a4512585
MD5 1bf20421d446e0afcccc22fe766d90d1
BLAKE2b-256 80e248ec6cd5c00c2be19ce2ff5a1b69d665857b1ad888f77c4e2c492cbb41c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe-0.2.0-py3-none-any.whl:

Publisher: release.yml on nerfstatus/nerfprobe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page