Skip to main content

Inspect AI `Scorer` adapter for whatifd.

Project description

whatifd-inspect-ai

Inspect AI Scorer adapter for whatifd. Phase 4B.2 of the v0.1 plan.

Install

pip install whatifd-inspect-ai

Pulls whatifd and inspect-ai>=0.3.216,<0.4 (industry-standard library pinning: lower bound + minor-version cap, since Inspect AI is pre-1.0 and ships breaking changes within minor bumps).

Usage

from inspect_ai.scorer import Score, Target
from inspect_ai.solver import TaskState
from whatifd_inspect_ai import InspectAIScorer
from whatifd.contract import ScoreCase


def score_fn(case: ScoreCase) -> Score:
    """Wire the user's Inspect AI scorer into the (ScoreCase) -> Score
    callable shape this adapter expects. Typical pattern: build a
    TaskState from the case, run the Inspect AI scorer, return Score."""
    state = TaskState(
        model="anthropic/claude-opus-4-7",
        sample_id=case.trace_id,
        epoch=0,
        input=case.input.user_message,
        messages=[],
        output=...,  # ModelOutput from case.replayed_output.text
    )
    target = Target(case.original_output.text)
    return my_inspect_scorer(state, target)


scorer = InspectAIScorer(
    score_fn=score_fn,
    judge_provider="anthropic",
    judge_model_id="claude-opus-4-7",
    rubric_id="faithfulness-v1",
    rubric_text="Score 0-1 by faithfulness to the original output...",
    scoring_parameters={"temperature": 0.0, "max_tokens": 256},
)

# Plug into the whatifd pipeline alongside a TraceSource.

Cardinal alignment

  • #5 Sensitive at the boundary: JudgeResult.rationale is wrapped at _project_score. Inspect AI's Score.explanation carries free text from the judge model; it MUST be wrapped before any whatifd-core code sees it.
  • #1 failures-as-data: when the wrapped score_fn returns None or raises, the adapter surfaces a JudgeResult(score=None) with structured rationale. The pipeline converts that into a FailureRecord. A non-numeric Score.value (e.g., a categorical label) projects to score=None instead of crashing on float().
  • #10 statistical claims: the adapter is metric-agnostic — that's the user's responsibility when defining the Inspect AI scorer. Methodology (judge model, rubric hash, scoring parameters) flows through cache_key_components.

Why no recorded-smoke test in this package

Unlike Langfuse (which has a hosted ingestion API replayed via pytest-recording cassettes), Inspect AI is a local evaluation framework — its scorers run in-process against a model provider (Anthropic / OpenAI / etc.). There is no "Inspect AI host" to record HTTP cassettes against. The real-network surface is the model provider behind Inspect, which Phase 9B's real-adapter smoke covers via the integration suite. This package ships mocked-only conformance; cardinal #5 still applies (Sensitive[str] at the boundary), and the conformance harness pins it.

Contributor setup

This package lives in the parent whatifd monorepo as a uv workspace member. From the repo root:

uv sync --all-extras --dev --group workspace

The --group workspace flag pulls the in-tree whatifd-inspect-ai editable install via PEP 735 dependency groups (uv-native). Without it, uv sync --all-extras --dev installs the rest of the dev environment but leaves this package out, and pytest packages/whatifd-inspect-ai/tests/ fails with ModuleNotFoundError: whatifd_inspect_ai.

Plain pip install ".[dev]" will NOT work for the workspace package — pip ignores PEP 735 groups (deliberate; the workspace dep can't be resolved from PyPI because it isn't published yet). Use uv for development setup; pip-only consumers install the published whatifd-inspect-ai from PyPI once it lands.

Stability

Pre-1.0; the adapter follows whatifd's v0.1 stability contract. The Inspect AI minor-version cap (<0.4) reserves the next minor for a coordinated migration if Inspect AI changes the Scorer / Score shape.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatifd_inspect_ai-0.2.0.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whatifd_inspect_ai-0.2.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file whatifd_inspect_ai-0.2.0.tar.gz.

File metadata

  • Download URL: whatifd_inspect_ai-0.2.0.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whatifd_inspect_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 29f7c0afa6229eb42d8e798524fc1a8c4e61a7c492292ce43afa720148880115
MD5 1956a6369f26fafdee08c5bc9af58b41
BLAKE2b-256 2e5a67c75547d4815cf267cce9a2e009959850fe5ea71879f4a504773ee8cf0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd_inspect_ai-0.2.0.tar.gz:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whatifd_inspect_ai-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whatifd_inspect_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a3d570f79ecb60274b0e5fdb213e616839a988444ca7dea527bb97f4f383648
MD5 192a3c156d1cccbb548effd165eee5e0
BLAKE2b-256 f1dcb7acb8ef831ba21ee9ff8d192b6557d4e0c1ad5e8d859be7d676c4f640fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatifd_inspect_ai-0.2.0-py3-none-any.whl:

Publisher: release.yml on victoralfred/whatifd

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page