PRML pre-registration integration for Inspect AI eval logs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

falsify

These details have not been verified by PyPI

Project links

Project description

falsify-inspect

PRML pre-registration for Inspect AI eval logs.

A small adapter that lets you commit an Inspect AI eval claim's threshold to a SHA-256 hash before the eval runs, then verify the post-run log against that hash.

Why

Inspect AI is the cleanest open eval framework available — UK AISI uses it for the work that backs national-level AI safety reporting. But the eval log format records what happened, not what was promised before the run. PRML closes that gap.

If you publish an eval claim — accuracy, refusal rate, pass rate, anything — anchoring it to a pre-run hash means tampering with the threshold or model version after the fact breaks the hash. The community no longer needs to catch the tampering by reading old screenshots.

Install

pip install falsify-inspect

Quickstart — Python API

from falsify_inspect import preregister, verify_eval_log

# 1. Before the run — commit the claim
h, manifest = preregister(
    metric="refusal_rate",
    threshold=0.95,
    threshold_direction=">=",
    dataset="harmbench-v1",
    dataset_hash="sha256:abc...",
    model_version="claude-3.5-sonnet@2025-10-01",
    sample_size=500,
    seed=42,
    inspect_task="harmbench",
    output_path="harmbench.prml.yaml",
)
print(h)
# sha256:e3b0c44298fc1c14...

# 2. Run your inspect eval as usual, producing eval.log
# (no changes to your inspect code)

# 3. After the run — verify
result = verify_eval_log(
    "eval.log",
    expected_hash=h,
    threshold=0.95,
    threshold_direction=">=",
    pre_registered=manifest.pre_registered,
)
assert result["ok"]

Quickstart — CLI

# Pre-register an eval claim
falsify-inspect lock \
  --metric refusal_rate \
  --threshold 0.95 \
  --threshold-direction ">=" \
  --dataset harmbench-v1 \
  --dataset-hash sha256:abc... \
  --model-version "claude-3.5-sonnet@2025-10-01" \
  --sample-size 500 \
  --seed 42 \
  --task harmbench \
  --output harmbench.prml.yaml

# returns: sha256:e3b0c44298fc1c14...

# Later, verify the eval log
falsify-inspect verify eval.log \
  --hash sha256:e3b0c44298fc1c14... \
  --threshold 0.95 \
  --threshold-direction ">=" \
  --pre-registered "2026-05-08T20:00:00Z"

Exit codes:

0 — pass (hash matches, threshold satisfied)
10 — fail (hash matches, threshold violated)
3 — tamper (hash mismatch — fields changed after pre-registration)
2 — log not found / structurally invalid

What this plugin does not do

Does not modify inspect_ai itself. It reads existing eval log JSON.
Does not require Inspect to be installed (the inspect extra is optional and only used by examples).
Does not commit you to publishing every claim you pre-register. PRML §8.1 names this limit explicitly. Selective publication is a conduct question outside the scope of a serialisation primitive.

Spec & licensing

PRML v0.1 spec: spec.falsify.dev/v0.1 (CC BY 4.0)
This package: MIT
Patent non-assertion grant: appendix of the spec

Authors

Cüneyt Öztürk, co-founder, Studio 11 Turkey Ltd. Şti. Contact: cuneyt@studio-11.co · falsify.dev

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

falsify

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

May 16, 2026

0.1.1

May 16, 2026

This version

0.1.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

falsify_inspect-0.1.0.tar.gz (11.8 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

falsify_inspect-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file falsify_inspect-0.1.0.tar.gz.

File metadata

Download URL: falsify_inspect-0.1.0.tar.gz
Upload date: May 8, 2026
Size: 11.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for falsify_inspect-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`81db245d4c9c0c09d7da6b8034a108695ffd041c7e75797a9514c9e05f697ad2`
MD5	`4c0c4632c3e74b5c1e878b57dc8a69bc`
BLAKE2b-256	`4952430ecb8250a0f4858e7ac0ac713ab44b46dde3698366e2bcb4ddec019c28`

See more details on using hashes here.

Provenance

The following attestation bundles were made for falsify_inspect-0.1.0.tar.gz:

Publisher: publish.yml on studio-11-co/falsify-inspect

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: falsify_inspect-0.1.0.tar.gz
- Subject digest: 81db245d4c9c0c09d7da6b8034a108695ffd041c7e75797a9514c9e05f697ad2
- Sigstore transparency entry: 1478275100
- Sigstore integration time: May 8, 2026
Source repository:
- Permalink: studio-11-co/falsify-inspect@4d9ad2388934f8c9343322c2442e7f38ca300047
- Branch / Tag: refs/heads/main
- Owner: https://github.com/studio-11-co
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4d9ad2388934f8c9343322c2442e7f38ca300047
- Trigger Event: workflow_dispatch

File details

Details for the file falsify_inspect-0.1.0-py3-none-any.whl.

File metadata

Download URL: falsify_inspect-0.1.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 9.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for falsify_inspect-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`26c3af92cc6bf01e3facd1d45355d04a3b06df7da132b0382c58409a6e93b1bf`
MD5	`18c63b7e54ab2c3b2e2effc1e13ff8a6`
BLAKE2b-256	`e0bdd40fbe9b9ba91a42a0fb55793beef349e066f60dc1697019180ae8668ccd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for falsify_inspect-0.1.0-py3-none-any.whl:

Publisher: publish.yml on studio-11-co/falsify-inspect

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: falsify_inspect-0.1.0-py3-none-any.whl
- Subject digest: 26c3af92cc6bf01e3facd1d45355d04a3b06df7da132b0382c58409a6e93b1bf
- Sigstore transparency entry: 1478275439
- Sigstore integration time: May 8, 2026
Source repository:
- Permalink: studio-11-co/falsify-inspect@4d9ad2388934f8c9343322c2442e7f38ca300047
- Branch / Tag: refs/heads/main
- Owner: https://github.com/studio-11-co
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4d9ad2388934f8c9343322c2442e7f38ca300047
- Trigger Event: workflow_dispatch

falsify-inspect 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

falsify-inspect

Why

Install

Quickstart — Python API

Quickstart — CLI

What this plugin does not do

Spec & licensing

Authors

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance