Skip to main content

PRML pre-registration integration for Giskard scenario results

Project description

falsify-giskard

PRML pre-registration for Giskard scenario results.

PRML v0.1 License: MIT

Commit a Giskard eval claim (a metric and threshold) to a SHA-256 before the run, then verify the realised ScenarioResult against it.

Why

Giskard runs a scenario of checks and reports pass/fail plus per-check metrics. But the report records what happened, not what was promised before the run. Pre-registering the claim means quietly relaxing a threshold or swapping a model after seeing results breaks the hash, so a passing scenario becomes tamper-evident.

This is the Giskard counterpart to falsify-inspect and uses the same PRML v0.1 manifest format.

Install

pip install falsify-giskard

Quickstart

from falsify_giskard import preregister, verify_scenario_result

# 1. Before the run — commit the claim
h, manifest = preregister(
    metric="pass_rate",            # or the name of a Giskard Metric (e.g. "groundedness")
    threshold=0.9,
    threshold_direction=">=",
    dataset="support-qa-v1",
    dataset_hash="sha256:abc...",
    seed=42,
    giskard_scenario="grounded-answers",
    output_path="grounded.prml.yaml",
)
print(h)  # sha256:...

# 2. Run your Giskard scenario as usual
result = await scenario.run()

# 3. After the run — verify
verdict = verify_scenario_result(result, "grounded.prml.yaml")
assert verdict["status"] == "PASS"   # PASS / FAIL / TAMPERED

Metrics

  • metric="pass_rate" verifies the fraction of (non-skipped) checks that passed.
  • metric="<name>" verifies the value of a Giskard Metric with that name (for example a semantic_similarity or groundedness score).

Verdicts

  • PASS — the manifest hash matches and the observed metric satisfies the committed threshold.
  • FAIL — the hash matches but the observed metric misses the threshold.
  • TAMPERED — the manifest file no longer matches its committed hash (it was altered after commit).

For durable tamper-evidence, commit the returned hash to git or to the public registry at registry.falsify.dev right after pre-registration, so the claim is anchored somewhere immutable.

License

MIT. The PRML specification is CC BY 4.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

falsify_giskard-0.1.0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

falsify_giskard-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file falsify_giskard-0.1.0.tar.gz.

File metadata

  • Download URL: falsify_giskard-0.1.0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for falsify_giskard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c87a28ac6fea816f498cbaf1a1b5a105e793558d54c3b3b9cd97ecda9dcf65b0
MD5 62a440b264a21e2041afa814274147bb
BLAKE2b-256 a5027e755598520b817e309ba5d00da1dba6c0c9a47041021b60500ab56d39cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for falsify_giskard-0.1.0.tar.gz:

Publisher: publish.yml on studio-11-co/falsify-giskard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file falsify_giskard-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for falsify_giskard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83319808d6dc4b7448beef6d27c7ec4b750faa83f3efa680f4f728478f05807c
MD5 b634e95201e9e1c6623c647244cf5fe3
BLAKE2b-256 951abc19afa60c7607ab4edf10a729ce0dedaf529581a53083fba4596d29e213

See more details on using hashes here.

Provenance

The following attestation bundles were made for falsify_giskard-0.1.0-py3-none-any.whl:

Publisher: publish.yml on studio-11-co/falsify-giskard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page