Skip to main content

Inspect AI `Scorer` adapter for whatifd. Phase 4B.2 of the v0.1 plan.

Project description

whatifd-inspect-ai

Inspect AI Scorer adapter for whatifd. Phase 4B.2 of the v0.1 plan.

Install

pip install whatifd-inspect-ai

Pulls whatifd and inspect-ai>=0.3.216,<0.4 (industry-standard library pinning: lower bound + minor-version cap, since Inspect AI is pre-1.0 and ships breaking changes within minor bumps).

Usage

from inspect_ai.scorer import Score, Target
from inspect_ai.solver import TaskState
from whatifd_inspect_ai import InspectAIScorer
from whatifd.contract import ScoreCase


def score_fn(case: ScoreCase) -> Score:
    """Wire the user's Inspect AI scorer into the (ScoreCase) -> Score
    callable shape this adapter expects. Typical pattern: build a
    TaskState from the case, run the Inspect AI scorer, return Score."""
    state = TaskState(
        model="anthropic/claude-opus-4-7",
        sample_id=case.trace_id,
        epoch=0,
        input=case.input.user_message,
        messages=[],
        output=...,  # ModelOutput from case.replayed_output.text
    )
    target = Target(case.original_output.text)
    return my_inspect_scorer(state, target)


scorer = InspectAIScorer(
    score_fn=score_fn,
    judge_provider="anthropic",
    judge_model_id="claude-opus-4-7",
    rubric_id="faithfulness-v1",
    rubric_text="Score 0-1 by faithfulness to the original output...",
    scoring_parameters={"temperature": 0.0, "max_tokens": 256},
)

# Plug into the whatifd pipeline alongside a TraceSource.

Cardinal alignment

  • #5 Sensitive at the boundary: JudgeResult.rationale is wrapped at _project_score. Inspect AI's Score.explanation carries free text from the judge model; it MUST be wrapped before any whatifd-core code sees it.
  • #1 failures-as-data: when the wrapped score_fn returns None or raises, the adapter surfaces a JudgeResult(score=None) with structured rationale. The pipeline converts that into a FailureRecord. A non-numeric Score.value (e.g., a categorical label) projects to score=None instead of crashing on float().
  • #10 statistical claims: the adapter is metric-agnostic — that's the user's responsibility when defining the Inspect AI scorer. Methodology (judge model, rubric hash, scoring parameters) flows through cache_key_components.

Why no recorded-smoke test in this package

Unlike Langfuse (which has a hosted ingestion API replayed via pytest-recording cassettes), Inspect AI is a local evaluation framework — its scorers run in-process against a model provider (Anthropic / OpenAI / etc.). There is no "Inspect AI host" to record HTTP cassettes against. The real-network surface is the model provider behind Inspect, which Phase 9B's real-adapter smoke covers via the integration suite. This package ships mocked-only conformance; cardinal #5 still applies (Sensitive[str] at the boundary), and the conformance harness pins it.

Contributor setup

This package lives in the parent whatifd monorepo as a uv workspace member. From the repo root:

uv sync --all-extras --dev --group workspace

The --group workspace flag pulls the in-tree whatifd-inspect-ai editable install via PEP 735 dependency groups (uv-native). Without it, uv sync --all-extras --dev installs the rest of the dev environment but leaves this package out, and pytest packages/whatifd-inspect-ai/tests/ fails with ModuleNotFoundError: whatifd_inspect_ai.

Plain pip install ".[dev]" will NOT work for the workspace package — pip ignores PEP 735 groups (deliberate; the workspace dep can't be resolved from PyPI because it isn't published yet). Use uv for development setup; pip-only consumers install the published whatifd-inspect-ai from PyPI once it lands.

Stability

Pre-1.0; the adapter follows whatifd's v0.1 stability contract. The Inspect AI minor-version cap (<0.4) reserves the next minor for a coordinated migration if Inspect AI changes the Scorer / Score shape.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatifd_inspect_ai-0.1.0.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whatifd_inspect_ai-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file whatifd_inspect_ai-0.1.0.tar.gz.

File metadata

  • Download URL: whatifd_inspect_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whatifd_inspect_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb72824d2c7d9a41b837941dbc66977850bbc0b57df59c669c65247590927031
MD5 b3acbfddd7f1ec523e835f3389d2ef5f
BLAKE2b-256 97d2e7b8d43776e8acd7a97b7dd23580967ab8ceb712e5a110938a17abac02ba

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd_inspect_ai-0.1.0.tar.gz:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whatifd_inspect_ai-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whatifd_inspect_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df577800dd5efcc54837dc703d85bf0242b5f897b4e40ebd19af44f5cad55347
MD5 541425382cb06890cd2307a2594bbd5e
BLAKE2b-256 baebd9b63299d5c7acc4048914fc8a4ac515294bfbc60cafff104488736e3d7f

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd_inspect_ai-0.1.0-py3-none-any.whl:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page