A claim-support / faithfulness scorer for Inspect AI — does the transcript actually substantiate the claimed answer?
Project description
inspect-claim-support
A claim-support (faithfulness / groundedness) scorer for Inspect AI, packaged as a standalone extension.
claim_support assesses whether a claimed answer is actually substantiated by
the conversation transcript — not whether it is correct in absolute terms. It is
a model-graded scorer with a rubric that maps SUPPORTED / PARTIAL / UNSUPPORTED
onto Inspect's CORRECT / PARTIAL / INCORRECT, and returns NOANSWER on a grader
parse failure.
Why it earns its place: absence isn't support
The rubric refuses to let absence of evidence pass as support. A negative claim like "I made no network calls" only scores SUPPORTED if the transcript is actually capable of showing that class of event. If the transcript cannot expose the relevant events, the claim is PARTIAL or UNSUPPORTED — never SUPPORTED. This surfaces overclaims instead of laundering them through a plausible rationale.
The scorer assesses support against the Inspect transcript only (transcript-visible events), not against actual runtime truth in the environment.
Install
pip install inspect-claim-support
Use
from inspect_ai import Task
from inspect_claim_support import claim_support
task = Task(
dataset=...,
solver=...,
scorer=claim_support(), # optionally: claim_support(model="openai/gpt-4o")
)
Once installed, the scorer is also resolvable by its namespaced registry name
inspect_claim_support/claim_support via Inspect's setuptools entry point.
Parameters
template— grading template (defaults to a SUPPORTED / PARTIAL / UNSUPPORTED rubric with the absence-isn't-support boundary built in).model— model to use for grading (defaults to the model being evaluated).
Origin & credit
This scorer originated as
UKGovernmentBEIS/inspect_ai#4166
(addressing issue #4143). The Inspect maintainers judged that it better fits an
external package than Inspect core, so it is distributed here. The implementation
uses only Inspect's public API (the internal chat_history helper is
reimplemented locally for transcript rendering).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inspect_claim_support-0.1.0.tar.gz.
File metadata
- Download URL: inspect_claim_support-0.1.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82b40692e96f11e3b72f49eb72dfa14dbd528f2bdad4fb81f8d5438209d115cf
|
|
| MD5 |
d3833b18b839d712047c44769cc4d261
|
|
| BLAKE2b-256 |
279fdf3893e8218ae4e5d56b4ac2844f8c2bafc226adce7584bd29f04cf541eb
|
File details
Details for the file inspect_claim_support-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inspect_claim_support-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3642ec21f899dd94425bfa0a191186ff1df3102546cdc048eea98d2b39221f2e
|
|
| MD5 |
8c928672177e07ce064653844138c1fd
|
|
| BLAKE2b-256 |
b7d41023024d7d657f48c0f542fb9b59753dcf71b67a8dfb3c9f7b559d05962f
|