Skip to main content

Shared probe and scorer implementations for LLM degradation detection

Project description

NerfProbe Core

Shared probe and scorer implementations for scientifically-grounded LLM degradation detection.

Installation

pip install nerfprobe-core

Overview

nerfprobe-core provides the detection logic used by:

Probes

14 probes across 3 tiers, each grounded in peer-reviewed research:

Core Tier

Probe Detection Research
MathProbe Arithmetic reasoning degradation 2504.04823
StyleProbe Vocabulary collapse (TTR) 2403.06408
TimingProbe Latency fingerprinting 2502.20589
CodeProbe Syntax collapse 2512.08213

Advanced Tier

Probe Detection Research
FingerprintProbe Framework detection 2407.15847
ContextProbe KV cache compression 2512.12008
RoutingProbe Model routing detection 2406.18665
RepetitionProbe Phrase looping 2403.06408
ConstraintProbe Instruction adherence 2409.11055
LogicProbe Reasoning drift 2504.04823
ChainOfThoughtProbe CoT integrity 2504.04823

Optional Tier

Probe Detection Research
CalibrationProbe Confidence calibration 2511.07585
ZeroPrintProbe Mode collapse 2407.01235
MultilingualProbe Cross-language asymmetry EMNLP.935

Scorers

10 scoring implementations:

  • MathScorer - Expected answer matching
  • TTRScorer - Type-Token Ratio calculation
  • CodeScorer - Python syntax validation
  • RepetitionScorer - N-gram repetition detection
  • ConstraintScorer - Word count and forbidden word checks
  • LogicScorer - Answer + reasoning validation
  • ChainOfThoughtScorer - Step counting & circular detection
  • CalibrationScorer - Confidence extraction
  • EntropyScorer - Shannon entropy calculation
  • MultilingualScorer - Cross-language consistency

Model Registry

Ships with 10 SOTA models (Dec 2025) with probe-relevant fields:

  • context_window - For ContextProbe
  • knowledge_cutoff - For TemporalProbe
from nerfprobe_core import get_model_info, RESEARCH_PROMPT

# Known model
info = get_model_info("gpt-5.2")
print(f"Context: {info.context_window:,}")

# Unknown model - get research prompt
prompt = RESEARCH_PROMPT.format(model_name="new-model", provider="provider")

Usage

from nerfprobe_core import ModelTarget
from nerfprobe_core.probes import MathProbe
from nerfprobe_core.probes.config import MathProbeConfig

# Configure probe
config = MathProbeConfig(
    prompt="What is 15 * 12 + 8 * 9?",
    expected_answer="252",
)

# Run probe
target = ModelTarget(provider_id="openai", model_name="gpt-5.2")
probe = MathProbe(config)
result = await probe.run(target, gateway)

print(result.summary())  # math_probe: PASS (1.00) in 234ms

Dependencies

  • pydantic>=2.0.0
  • pyyaml>=6.0.0

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nerfprobe_core-0.1.0.tar.gz (61.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nerfprobe_core-0.1.0-py3-none-any.whl (47.8 kB view details)

Uploaded Python 3

File details

Details for the file nerfprobe_core-0.1.0.tar.gz.

File metadata

  • Download URL: nerfprobe_core-0.1.0.tar.gz
  • Upload date:
  • Size: 61.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 44b8a3a8b601914d005d7abaff35f3cc9b4ecf3e5f18368d0be8eee4e3a17639
MD5 23d468f28d2063f66d8d4d5b106f4bdb
BLAKE2b-256 8391b882afb0cdcdbded040d6b7b29b866a04299eb61b868302ec3652775faa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe_core-0.1.0.tar.gz:

Publisher: release.yml on nerfstatus/nerfprobe-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nerfprobe_core-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nerfprobe_core-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f27461824d79919e9583dade5dab50c6005f24699c2549cacdb62ab41c835bef
MD5 4b617afb583f4b92734436f85ada4763
BLAKE2b-256 83ace9229b8fa2dd360de81656f8353facbaddde41cdef3458765636331ba1ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe_core-0.1.0-py3-none-any.whl:

Publisher: release.yml on nerfstatus/nerfprobe-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page