Skip to main content

Shared probe and scorer implementations for LLM degradation detection

Project description

NerfProbe Core

Scientifically-grounded LLM degradation detection.

nerfprobe-core provides the essential detection logic, scorers, and probe definitions used by the NerfProbe CLI and NerfStatus. It is designed for developers who need rigorous, research-backed instruments to measure model quality, consistency, and alignment.

Installation

pip install nerfprobe-core

Features

  • 17 Research-Backed Probes: Detection instruments grounded in specific academic papers on model collapse and quantization artifacts.
  • 3-Tier Architecture: Organized into Core (Essential), Advanced (Structural), and Optional (Experimental) tiers.
  • Universal Scoring: Reusable scorers (JSON schema validation, TTR, Fact verification) independent of the probe execution.
  • Type-Safe: Fully typed with Python 3.11+, leveraging Pydantic for configuration and results.

Probes

Core Tier (Essential Signals)

Probe Detection Target Paper
MathProbe Arithmetic reasoning degradation 2504.04823
StyleProbe Vocabulary collapse (Type-Token Ratio) 2403.06408
TimingProbe Latency fingerprinting & TTFT degradation 2502.20589
CodeProbe Syntax collapse in generated code 2512.08213
FactProbe Factual recall and hallucination checks N/A

Advanced Tier (Structural Integrity)

Probe Detection Target Paper
JsonProbe JSON schema adherence & structure 2402.16775
ConsistencyProbe Fact permanence & self-contradiction 2504.04823
FingerprintProbe Underlying framework/model identity detection 2407.15847
ContextProbe Key-Value cache compression artifacts 2512.12008
RoutingProbe MoE routing path detection 2406.18665
RepetitionProbe Loop detection & phrase repetition 2403.06408
ConstraintProbe Negative constraint adherence 2409.11055
LogicProbe Reasoning step validity 2504.04823
ChainOfThoughtProbe CoT step integrity 2504.04823

Optional Tier (Experimental)

Probe Detection Target Paper
CalibrationProbe Confidence score calibration 2511.07585
ZeroPrintProbe Mode collapse via entropy measurement 2407.01235
MultilingualProbe Cross-language performance asymmetry EMNLP.935

Usage

Basic Probe Execution

import asyncio
from nerfprobe_core import ModelTarget
from nerfprobe_core.probes import MathProbe
from nerfprobe_core.probes.config import MathProbeConfig

# 1. Configure the probe
config = MathProbeConfig(
    prompt="Calculate 15 * 12 + 8.",
    expected_answer="188",
)

# 2. Define the target
target = ModelTarget(provider_id="openai", model_name="gpt-4o")

# 3. Instantiate and run (requires an LLM Gateway)
probe = MathProbe(config)
# result = await probe.run(target, gateway)

# print(result.summary())
# > math_probe: PASS (1.00) in 234ms

Using Scorers Directly

You can use the scoring logic without the full probe infrastructure:

from nerfprobe_core.scorers import JsonScorer

scorer = JsonScorer(strict=True)
valid_json = '{"name": "NerfProbe"}'
invalid_json = '```json{"name": "NerfProbe"}```'

score, metadata = scorer.score(valid_json)
print(f"Score: {score}") # 1.0

score, metadata = scorer.score(invalid_json)
print(f"Score: {score}") # 0.0 (Strict mode rejects markdown blocks)

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details on how to set up the development environment, run tests, and submit PRs.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nerfprobe_core-0.2.1.tar.gz (85.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nerfprobe_core-0.2.1-py3-none-any.whl (57.7 kB view details)

Uploaded Python 3

File details

Details for the file nerfprobe_core-0.2.1.tar.gz.

File metadata

  • Download URL: nerfprobe_core-0.2.1.tar.gz
  • Upload date:
  • Size: 85.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe_core-0.2.1.tar.gz
Algorithm Hash digest
SHA256 505bc553bf2d958b590a34ec71641158073dee9772e11c2b76c63edfe5de5f08
MD5 2ff6f46db14474cee04428a61c17ab21
BLAKE2b-256 cca6ee9ce1a2805f8282f48eb9e6acee2f9fa0ff3f1e7753c4a8acc9c723b5fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe_core-0.2.1.tar.gz:

Publisher: release.yml on nerfstatus/nerfprobe-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nerfprobe_core-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: nerfprobe_core-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 57.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nerfprobe_core-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5fe2ef1de4e07241d4ec47cf063d67b2684e6ee03682220d180123fc92ca074e
MD5 860c4f8fa110fa6fe41370356c3f4f44
BLAKE2b-256 433a7c04d7d3c748caf5921bc7ff4af130b23ad16f96bcf774d60407c57a4f3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for nerfprobe_core-0.2.1-py3-none-any.whl:

Publisher: release.yml on nerfstatus/nerfprobe-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page