Skip to main content

DFAH (Decision-Faithfulness Assessment Harness) determinism + faithfulness eval harness for Evidentia — dev-time AI-output quality gates

Project description

evidentia-eval

Dev-time AI-output quality eval harness for Evidentia.

Hosts the DFAH (Decision-Faithfulness Assessment Harness) — the auditor-defensible numerical proof layer that validates LLM-driven artifact production is deterministic, replay- equivalent, and faithful to its source policy clauses.

Why this package exists (v0.10.5 P9 extraction)

The DFAH harness was originally bundled into evidentia-ai (the risk-statement generator + control explainer package). That conflated two very different deployment surfaces:

  • evidentia-ai — PRODUCTION runtime. Needed in air-gap installs to actually generate risk statements.
  • evidentia-eval — DEVELOPMENT-time evaluation. NOT needed in air-gap installs; only fires when a CI pipeline runs a determinism / faithfulness gate before tagging a release.

Extracting the eval harness lets air-gap installs of evidentia-ai skip the optional sentence-transformers stack entirely (it now lives behind evidentia-eval[faithfulness-semantic] instead of evidentia-ai[eval-faithfulness]).

Quick start

# Stdlib Jaccard baseline (no extra needed; <10 MB install)
pip install evidentia-eval

# Optional semantic-similarity faithfulness (~250 MB extra
# for sentence-transformers + numpy + model cache on first use)
pip install 'evidentia-eval[faithfulness-semantic]'

CLI verbs:

# Smoke test against a deterministic stub generator (no LLM
# tokens burned)
evidentia eval stub-smoke

# Real-LLM determinism gate against the risk-statement generator
evidentia eval risk-determinism --gap-report gaps.json \
    --system-context ctx.yaml \
    --fail-on-determinism-rate-below 0.95

# Verify a previously-signed eval bundle
evidentia eval verify path/to/eval-output.json

The CLI verbs live in evidentia.cli.eval (the meta-package); this package contributes the underlying library.

Public API

Symbol Purpose
DFAHarness Owns the run loop + audit emit
EvalResult Top-level harness output (JSON-serializable, Sigstore-signable)
EvalSample One prompt's inputs (immutable; audit-trail-stable)
DeterminismResult Per-prompt determinism outcome
ReplayResult Per-prompt replay-equivalence outcome
FaithfulnessResult Per-claim faithfulness outcome
PromptFaithfulnessResult Aggregated per-prompt faithfulness
faithfulness_score Stdlib Jaccard token-overlap baseline
faithfulness_score_semantic Sentence-transformers path (optional extra)
determinism_score Computes the modal-output pass rate
replay_equivalent Binary replay-equivalence check
extract_claims Atomic-claim extraction from generated artifacts
normalize_for_determinism Canonical whitespace + punctuation normalization
hash_output SHA-256 hex of normalized output
sign_eval_result Sigstore-sign an EvalResult JSON
verify_eval_result Verify a previously-signed eval bundle

Backward-compat shim

For external scripts that still import from evidentia_ai.eval import ..., evidentia-ai ships a deprecation shim that re-exports from evidentia_eval. The shim warns once at import time and is scheduled for removal in v0.12.0.

License

Apache-2.0. See the workspace root LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evidentia_eval-0.10.5.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evidentia_eval-0.10.5-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file evidentia_eval-0.10.5.tar.gz.

File metadata

  • Download URL: evidentia_eval-0.10.5.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for evidentia_eval-0.10.5.tar.gz
Algorithm Hash digest
SHA256 73f22b1e98fb71139abd860a728b4738fee83b4d82dd98a0e57b1cc625c61310
MD5 51ab5094f06747d3ee5265082ed1137b
BLAKE2b-256 cc6c95543e483b3c189199824b97d2161deb5b44b75dc9ebc89d9c53cef3d96a

See more details on using hashes here.

Provenance

The following attestation bundles were made for evidentia_eval-0.10.5.tar.gz:

Publisher: release.yml on Polycentric-Labs/evidentia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file evidentia_eval-0.10.5-py3-none-any.whl.

File metadata

  • Download URL: evidentia_eval-0.10.5-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for evidentia_eval-0.10.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6b066a9939805f4916cb3a16e78eea05281f99175aa76325b8489819c08aa2b0
MD5 e705d3339b82211e223bd08ed8ea77ad
BLAKE2b-256 a1f90dce9e4a1d01bc1ee065caec68b3fb09cdcfed4f208fea696ba336b34aad

See more details on using hashes here.

Provenance

The following attestation bundles were made for evidentia_eval-0.10.5-py3-none-any.whl:

Publisher: release.yml on Polycentric-Labs/evidentia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page