Opt-in lint for Inspect AI tasks: warn when a verifiable task uses a model-graded scorer where a deterministic alternative is available.
Project description
inspect-build-time-contract
Opt-in lint for Inspect AI tasks: warn when a task you've declared verifiable uses a model-graded scorer where a deterministic alternative is available.
The Inspect AI scorer documentation recommends "deterministic where possible, LLM where necessary." This package makes that recommendation mechanically checkable for tasks that opt in.
Install
pip install inspect-build-time-contract
Usage
from inspect_ai import Task, task
from inspect_ai.scorer import match, model_graded_qa
from inspect_build_time_contract import verifiable_task
# Deterministic scorer on a verifiable task: silent.
@verifiable_task
def my_factoid_eval():
return Task(dataset=..., scorer=match())
# Model-graded scorer on a verifiable task: WARNING at task load.
@verifiable_task
def my_judged_eval():
return Task(dataset=..., scorer=model_graded_qa())
# WARNING:inspect_build_time_contract:Task 'my_judged_eval' is decorated with
# @verifiable_task but its scorer is classified as 'model_graded'.
# Consider a deterministic alternative ... or use Inspect's @task directly.
# Task with no claim about verifiability: use Inspect's @task as normal.
@task
def my_genuinely_subjective_eval():
return Task(dataset=..., scorer=model_graded_qa())
CI mode
Set INSPECT_BUILD_TIME_CONTRACT_STRICT=1 to escalate warnings to a RuntimeError:
INSPECT_BUILD_TIME_CONTRACT_STRICT=1 inspect eval my_eval.py
# Warnings now raise; CI fails on contract violations.
Scorer taxonomy
| Class | Inspect built-ins |
|---|---|
deterministic |
match, includes, pattern, exact, f1, answer, choice, math |
model_graded |
model_graded_qa, model_graded_fact |
unknown |
Any custom or third-party scorer the package doesn't recognize |
Custom scorers are classified as "unknown" and fire the warning. To suppress, either use Inspect's @task directly (you've opted out of the verifiable contract) or fork the package and add your scorer to DETERMINISTIC_BUILTINS / MODEL_GRADED_BUILTINS.
What this is not
- It does not force any task to use a deterministic scorer.
- It does not override any existing Inspect API.
@taskcontinues to work exactly as before. - It does not run at eval time. It's a pre-flight check at task load.
Why this exists
I built Jig around the idea that an LLM-eval framework should make "declare your deterministic check at build time" a first-class concept. A pre-registered N=50 study on BIRD-SQL (results) found a Sonnet 4.6 LLM-as-judge had a 40% false-approval rate against the deterministic execution-based scorer; a Haiku 4.5 judge had 10% false-approval rate. Even when the deterministic check is sitting right there, choosing model-graded is a measurable accuracy cost.
This extension is a small experiment in surfacing that choice at task-definition time inside Inspect AI specifically. There's an upstream issue proposing the taxonomy + lint as in-core features at UKGovernmentBEIS/inspect_ai. If that lands, this package will be deprecated in favor of in-core support.
Compatibility
Tested against inspect-ai==0.3.212. Should work with any 0.3.200+ version. Requires Python 3.10+.
Development
git clone https://github.com/smledbetter/inspect-build-time-contract
cd inspect-build-time-contract
uv venv --python 3.11 .venv
uv pip install --python .venv/bin/python -e ".[dev]"
.venv/bin/python -m pytest tests/
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inspect_build_time_contract-0.1.0.tar.gz.
File metadata
- Download URL: inspect_build_time_contract-0.1.0.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9711b25b1844fa503249c13b0ff01700ef352c4f25844d29f96e4d8912901e8
|
|
| MD5 |
61e250aff3b8e22f4eb29d6363ef28c3
|
|
| BLAKE2b-256 |
96137c86d79b223af614ccb0986fe0d1b4ad1bd6f985672f95a66bc43e8332c8
|
File details
Details for the file inspect_build_time_contract-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inspect_build_time_contract-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca5d0de591d2159100df53744a68e786761687ca60fa1842d28bec2a6d897288
|
|
| MD5 |
7f39d7443dec6310b087b4f7fc96c59f
|
|
| BLAKE2b-256 |
8696e4249d600fd81f0431fae7331ca20abf13129e3ef6e1fd80495e1e9d34e6
|