Skip to main content

A multi-dimensional metric for evaluating the evidential quality of LLM responses

Project description

VerifyIndex

A multi-dimensional metric for evaluating the evidential quality of Large Language Model responses.

VerifyIndex decomposes the factuality of an LLM response into six sub-dimensions — Verifiability, Evidence coverage, Retrieval precision, Inferential support, Fidelity, and Yield — enabling diagnosis of failure modes that single-percentage metrics conflate.

Designed for both research evaluation and enterprise deployment governance, VerifyIndex integrates with the R-BED (Risk-Based Evaluability Design) framework for operational use in regulated industries.

Installation

pip install verifyindex

For the full retrieval-and-classifier pipeline (recommended for production use):

pip install "verifyindex[ml,retrieval]"

Quick Start

from verifyindex import VerifyIndex

vi = VerifyIndex()

result = vi.score(
    response="Marie Curie was born in Warsaw in 1867 and won two Nobel Prizes in Physics and Chemistry.",
    knowledge_source="wikipedia",
)

print(result.summary())

Output:

VerifyIndex score: 0.784
  Verifiability (V):        1.000
  Evidence coverage (E):    1.000
  Retrieval precision (R):  0.887
  Inferential support (I):  1.000
  Fidelity (F):             0.500
  Total atomic claims:      4

The composite Y score is the geometric mean of V, E, R, I, F. The full profile is available for diagnostic use.

The VerifyIndex Profile

Each response produces a six-dimensional profile:

Dim Name Measures
V Verifiability Fraction of claims that are checkable against sources
E Evidence Coverage Fraction of verifiable claims with retrievable evidence
R Retrieval Precision Quality of retrieved evidence for each claim
I Inferential Support Fraction of claims entailed by their evidence
F Fidelity Fraction of entailed claims that faithfully represent the evidence
Y Yield (composite) Geometric mean of V, E, R, I, F

Two responses can have identical Y scores but very different profiles. VerifyIndex exposes this for downstream decisions.

Enterprise Deployment: R-BED Integration

For regulated deployments using the R-BED governance framework:

from verifyindex import VerifyIndex
from verifyindex.rbed import rbed_evidence_report

vi = VerifyIndex()
result = vi.score(response=response_text, knowledge_source="internal_kb")

# Produce structured evidence for R-BED sub-dimensions
evidence = rbed_evidence_report(
    profile=result.profile,
    thresholds={"V": 0.85, "E": 0.80, "R": 0.75, "I": 0.85, "F": 0.90, "Y": 0.75},
)

for key, finding in evidence.findings.items():
    status = "PASS" if finding["passed"] else "FAIL"
    print(f"{key}: {finding['score']:.3f} ({status}) — R-BED Vertex {finding['vertex']}")

Each VerifyIndex sub-dimension maps to specific R-BED sub-dimensions. See the paper Section 6 or the R-BED book Chapters 6-8 for the full mapping.

Current Status

VerifyIndex 0.1.0 is an alpha release providing the package structure, interfaces, and reference stub implementations. Production-grade classifiers for Verifiability and Fidelity are in development and will be released in v0.2.0.

For an early-stage integration, plug in your own retrieval and classifier implementations by subclassing Retriever and passing model identifiers to the VerifyIndex constructor.

Citation

If you use VerifyIndex in your research or product, please cite:

@article{srivastava2026verifyindex,
  title={VerifyIndex: A Multi-Dimensional Metric for Evaluating the Evidential Quality of Large Language Model Responses},
  author={Srivastava, Vishal and Sah, Tanmay},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2026}
}

And, if applicable to your governance context, the R-BED framework:

@book{srivastava2026rbed,
  title={The AI Evaluability Crisis: How to Build Evaluable AI Systems Using R-BED},
  author={Srivastava, Vishal and Sah, Tanmay},
  year={2026},
  publisher={EvaluabilityAI}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

verifyindex-0.1.0.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

verifyindex-0.1.0-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file verifyindex-0.1.0.tar.gz.

File metadata

  • Download URL: verifyindex-0.1.0.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for verifyindex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 833de5056ca110909011db862419c6c3b80a5e3c8da2c4ce5947f251320955fd
MD5 61dc8f9ec6c29e7d2f50bd7d47e29049
BLAKE2b-256 220a178f2ee346d10e508e458adc516261b67d7a73239b1a5550e223bf565991

See more details on using hashes here.

Provenance

The following attestation bundles were made for verifyindex-0.1.0.tar.gz:

Publisher: publish.yml on vsrivas7/verifyindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file verifyindex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: verifyindex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for verifyindex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c6d3fd469b309cdff9617bf2b8b678af6c4feba9234b3c01ef3b64f66f26c608
MD5 69f7854fc91e52d5e70f18fad0e38189
BLAKE2b-256 9c389e2a964e72dfb18b59c607e15c7f2811a276d36efdcac61318a0d6bce5da

See more details on using hashes here.

Provenance

The following attestation bundles were made for verifyindex-0.1.0-py3-none-any.whl:

Publisher: publish.yml on vsrivas7/verifyindex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page