Skip to main content

Shared probe and scorer implementations for LLM degradation detection

Project description

NerfProbe Core

Shared probe and scorer implementations for scientifically-grounded LLM degradation detection.

Installation

pip install nerfprobe-core

Overview

nerfprobe-core provides the detection logic used by:

Probes

14 probes across 3 tiers, each grounded in peer-reviewed research:

Core Tier

Probe Detection Research
MathProbe Arithmetic reasoning degradation 2504.04823
StyleProbe Vocabulary collapse (TTR) 2403.06408
TimingProbe Latency fingerprinting 2502.20589
CodeProbe Syntax collapse 2512.08213

Advanced Tier

Probe Detection Research
FingerprintProbe Framework detection 2407.15847
ContextProbe KV cache compression 2512.12008
RoutingProbe Model routing detection 2406.18665
RepetitionProbe Phrase looping 2403.06408
ConstraintProbe Instruction adherence 2409.11055
LogicProbe Reasoning drift 2504.04823
ChainOfThoughtProbe CoT integrity 2504.04823

Optional Tier

Probe Detection Research
CalibrationProbe Confidence calibration 2511.07585
ZeroPrintProbe Mode collapse 2407.01235
MultilingualProbe Cross-language asymmetry EMNLP.935

Scorers

10 scoring implementations:

  • MathScorer - Expected answer matching
  • TTRScorer - Type-Token Ratio calculation
  • CodeScorer - Python syntax validation
  • RepetitionScorer - N-gram repetition detection
  • ConstraintScorer - Word count and forbidden word checks
  • LogicScorer - Answer + reasoning validation
  • ChainOfThoughtScorer - Step counting & circular detection
  • CalibrationScorer - Confidence extraction
  • EntropyScorer - Shannon entropy calculation
  • MultilingualScorer - Cross-language consistency

Model Registry

Ships with 10 SOTA models (Dec 2025) with probe-relevant fields:

  • context_window - For ContextProbe
  • knowledge_cutoff - For TemporalProbe
from nerfprobe_core import get_model_info, RESEARCH_PROMPT

# Known model
info = get_model_info("gpt-5.2")
print(f"Context: {info.context_window:,}")

# Unknown model - get research prompt
prompt = RESEARCH_PROMPT.format(model_name="new-model", provider="provider")

Usage

from nerfprobe_core import ModelTarget
from nerfprobe_core.probes import MathProbe
from nerfprobe_core.probes.config import MathProbeConfig

# Configure probe
config = MathProbeConfig(
    prompt="What is 15 * 12 + 8 * 9?",
    expected_answer="252",
)

# Run probe
target = ModelTarget(provider_id="openai", model_name="gpt-5.2")
probe = MathProbe(config)
result = await probe.run(target, gateway)

print(result.summary())  # math_probe: PASS (1.00) in 234ms

Dependencies

  • pydantic>=2.0.0
  • pyyaml>=6.0.0

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nerfprobe_core-0.2.0.tar.gz (64.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nerfprobe_core-0.2.0-py3-none-any.whl (53.8 kB view details)

Uploaded Python 3

File details

Details for the file nerfprobe_core-0.2.0.tar.gz.

File metadata

  • Download URL: nerfprobe_core-0.2.0.tar.gz
  • Upload date:
  • Size: 64.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe_core-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2e50f416fdddba37d104be2ebdf06d669b2af8e86718c1af86b1244a551cae97
MD5 09599a354f0b4fca78eefd7ada203a9e
BLAKE2b-256 a9c95c5d09888f83b68813a658cc4a53eda6a33b541a8d41f51c6006ce416932

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe_core-0.2.0.tar.gz:

Publisher: release.yml on nerfstatus/nerfprobe-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nerfprobe_core-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: nerfprobe_core-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 53.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe_core-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f9f05180034499e2e99070c5a1e9765bf1cf92c538004faa18588b59de31acd
MD5 36fe5513d131ee46251c1f61a0685958
BLAKE2b-256 c85b10464390736e44d6756fba488a28d34096b49a9cf4e9fd840c8cea7d622c

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe_core-0.2.0-py3-none-any.whl:

Publisher: release.yml on nerfstatus/nerfprobe-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page