Skip to main content

A claim-support / faithfulness scorer for Inspect AI — does the transcript actually substantiate the claimed answer?

Project description

inspect-claim-support

A claim-support (faithfulness / groundedness) scorer for Inspect AI, packaged as a standalone extension.

claim_support assesses whether a claimed answer is actually substantiated by the conversation transcript — not whether it is correct in absolute terms. It is a model-graded scorer with a rubric that maps SUPPORTED / PARTIAL / UNSUPPORTED onto Inspect's CORRECT / PARTIAL / INCORRECT, and returns NOANSWER on a grader parse failure.

Why it earns its place: absence isn't support

The rubric refuses to let absence of evidence pass as support. A negative claim like "I made no network calls" only scores SUPPORTED if the transcript is actually capable of showing that class of event. If the transcript cannot expose the relevant events, the claim is PARTIAL or UNSUPPORTED — never SUPPORTED. This surfaces overclaims instead of laundering them through a plausible rationale.

The scorer assesses support against the Inspect transcript only (transcript-visible events), not against actual runtime truth in the environment.

Install

pip install inspect-claim-support

Use

from inspect_ai import Task
from inspect_claim_support import claim_support

task = Task(
    dataset=...,
    solver=...,
    scorer=claim_support(),   # optionally: claim_support(model="openai/gpt-4o")
)

Once installed, the scorer is also resolvable by its namespaced registry name inspect_claim_support/claim_support via Inspect's setuptools entry point.

Parameters

  • template — grading template (defaults to a SUPPORTED / PARTIAL / UNSUPPORTED rubric with the absence-isn't-support boundary built in).
  • model — model to use for grading (defaults to the model being evaluated).

Origin & credit

This scorer originated as UKGovernmentBEIS/inspect_ai#4166 (addressing issue #4143). The Inspect maintainers judged that it better fits an external package than Inspect core, so it is distributed here. The implementation uses only Inspect's public API (the internal chat_history helper is reimplemented locally for transcript rendering).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inspect_claim_support-0.1.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inspect_claim_support-0.1.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file inspect_claim_support-0.1.0.tar.gz.

File metadata

  • Download URL: inspect_claim_support-0.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for inspect_claim_support-0.1.0.tar.gz
Algorithm Hash digest
SHA256 82b40692e96f11e3b72f49eb72dfa14dbd528f2bdad4fb81f8d5438209d115cf
MD5 d3833b18b839d712047c44769cc4d261
BLAKE2b-256 279fdf3893e8218ae4e5d56b4ac2844f8c2bafc226adce7584bd29f04cf541eb

See more details on using hashes here.

File details

Details for the file inspect_claim_support-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for inspect_claim_support-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3642ec21f899dd94425bfa0a191186ff1df3102546cdc048eea98d2b39221f2e
MD5 8c928672177e07ce064653844138c1fd
BLAKE2b-256 b7d41023024d7d657f48c0f542fb9b59753dcf71b67a8dfb3c9f7b559d05962f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page