Skip to main content

DFAH (Decision-Faithfulness Assessment Harness) determinism + faithfulness eval harness for Evidentia — dev-time AI-output quality gates

Project description

evidentia-eval

Dev-time AI-output quality eval harness for Evidentia.

Hosts the DFAH (Decision-Faithfulness Assessment Harness) — the auditor-defensible numerical proof layer that validates LLM-driven artifact production is deterministic, replay- equivalent, and faithful to its source policy clauses.

Why this package exists (v0.10.5 P9 extraction)

The DFAH harness was originally bundled into evidentia-ai (the risk-statement generator + control explainer package). That conflated two very different deployment surfaces:

  • evidentia-ai — PRODUCTION runtime. Needed in air-gap installs to actually generate risk statements.
  • evidentia-eval — DEVELOPMENT-time evaluation. NOT needed in air-gap installs; only fires when a CI pipeline runs a determinism / faithfulness gate before tagging a release.

Extracting the eval harness lets air-gap installs of evidentia-ai skip the optional sentence-transformers stack entirely (it now lives behind evidentia-eval[faithfulness-semantic] instead of evidentia-ai[eval-faithfulness]).

Quick start

# Stdlib Jaccard baseline (no extra needed; <10 MB install)
pip install evidentia-eval

# Optional semantic-similarity faithfulness (~250 MB extra
# for sentence-transformers + numpy + model cache on first use)
pip install 'evidentia-eval[faithfulness-semantic]'

CLI verbs:

# Smoke test against a deterministic stub generator (no LLM
# tokens burned)
evidentia eval stub-smoke

# Real-LLM determinism gate against the risk-statement generator
evidentia eval risk-determinism --gap-report gaps.json \
    --system-context ctx.yaml \
    --fail-on-determinism-rate-below 0.95

# Verify a previously-signed eval bundle
evidentia eval verify path/to/eval-output.json

The CLI verbs live in evidentia.cli.eval (the meta-package); this package contributes the underlying library.

Public API

Symbol Purpose
DFAHarness Owns the run loop + audit emit
EvalResult Top-level harness output (JSON-serializable, Sigstore-signable)
EvalSample One prompt's inputs (immutable; audit-trail-stable)
DeterminismResult Per-prompt determinism outcome
ReplayResult Per-prompt replay-equivalence outcome
FaithfulnessResult Per-claim faithfulness outcome
PromptFaithfulnessResult Aggregated per-prompt faithfulness
faithfulness_score Stdlib Jaccard token-overlap baseline
faithfulness_score_semantic Sentence-transformers path (optional extra)
determinism_score Computes the modal-output pass rate
replay_equivalent Binary replay-equivalence check
extract_claims Atomic-claim extraction from generated artifacts
normalize_for_determinism Canonical whitespace + punctuation normalization
hash_output SHA-256 hex of normalized output
sign_eval_result Sigstore-sign an EvalResult JSON
verify_eval_result Verify a previously-signed eval bundle

Backward-compat shim

For external scripts that still import from evidentia_ai.eval import ..., evidentia-ai ships a deprecation shim that re-exports from evidentia_eval. The shim warns once at import time and is scheduled for removal in v0.12.0.

License

Apache-2.0. See the workspace root LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evidentia_eval-0.10.7.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evidentia_eval-0.10.7-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file evidentia_eval-0.10.7.tar.gz.

File metadata

  • Download URL: evidentia_eval-0.10.7.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for evidentia_eval-0.10.7.tar.gz
Algorithm Hash digest
SHA256 a62ddb06dd87b02b5fce3b0e8e28bcce2d4e68774a8822f0019fa4483c5b18d6
MD5 e9217182d75cf7655c586bd2119192c8
BLAKE2b-256 3d0823ef50e9ce23f4a46983f2e81bcc3388b44ab29cc0002e69234619f60ee5

See more details on using hashes here.

Provenance

The following attestation bundles were made for evidentia_eval-0.10.7.tar.gz:

Publisher: release.yml on Polycentric-Labs/evidentia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file evidentia_eval-0.10.7-py3-none-any.whl.

File metadata

  • Download URL: evidentia_eval-0.10.7-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for evidentia_eval-0.10.7-py3-none-any.whl
Algorithm Hash digest
SHA256 89d86c10b3f0b8ab05a951d6f43e1cc37d072307592a808110eae233866ba316
MD5 cd59f794c9fb703e76a7ae397ac4201b
BLAKE2b-256 3ae2e93c51b2cd12053f45cc2e6bca3f462a9324f89ae349d309d289f66908b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for evidentia_eval-0.10.7-py3-none-any.whl:

Publisher: release.yml on Polycentric-Labs/evidentia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page