Skip to main content

Heuristic quality metrics for RAG retrieval and grounded answers. Python port of @mukundakatta/rag-quality-kit.

Project description

rag-quality-kit

PyPI Python License: MIT

Heuristic quality metrics for RAG retrieval and grounded answers. Zero runtime dependencies, pure-Python.

Python port of @mukundakatta/rag-quality-kit. The JS sibling has the original heuristics; this README sticks to the Python API.

Install

pip install rag-quality-kit

Usage

from rag_quality_kit import score, missing_evidence

question = "Who wrote Hamlet and when was it first performed?"
contexts = [
    {"id": "doc-1", "text": "Hamlet is a tragedy by William Shakespeare, written around 1600."},
    {"id": "doc-2", "text": "Records suggest Hamlet was first performed in 1602."},
]
answer = "Hamlet was written by Shakespeare and first performed in 1602."

r = score(question, contexts, answer)
r.groundedness         # 0..1 -- answer terms that appear in any context
r.context_relevance    # 0..1 -- question terms covered by the contexts
r.answer_relevance     # 0..1 -- question terms covered by the answer
r.conciseness          # 0..1 -- 1.0 if answer is roughly question-sized, decays as it balloons
r.overall              # unweighted mean of the four

missing_evidence(answer, contexts)   # -> list[str] of answer terms not in any context

Metrics

Metric Range Behavior
groundedness 0..1 Fraction of answer terms found in any context.
context_relevance 0..1 Fraction of (longer) question terms covered by the contexts. Mirrors the JS retrievalCoverage.
answer_relevance 0..1 Fraction of question terms that the answer addresses.
conciseness 0..1 1.0 when the answer is up to ~2x the question's term count; linearly decays to 0 at 10x.
overall 0..1 Unweighted mean of the four.

All metrics are heuristic and token-overlap based -- fast, deterministic, no LLM calls. For evaluation-grade scoring layer an LLM judge on top.

API differences from the JS sibling

  • Python signature is score(question, contexts, answer) (positional) instead of scoreRag({ query, answer, contexts }).
  • Returns a QualityResult dataclass instead of a plain object.
  • Metric names: context_relevance (was retrievalCoverage), groundedness is unchanged. Adds two extra heuristics: answer_relevance and conciseness. The aggregate is overall (was score) and now averages all four.
  • Drops the citationCoverage metric -- it's heavily citation-format dependent and best owned by the calling app. Use missing_evidence(answer, contexts) for an analogous signal.

See the JS sibling for the original heuristics and broader design notes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_quality_kit-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_quality_kit-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file rag_quality_kit-0.1.0.tar.gz.

File metadata

  • Download URL: rag_quality_kit-0.1.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for rag_quality_kit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bf20a1ca372b8bfb3c5f404cc82eccf47199bdddd155c198687b5561c73f284e
MD5 2d1b7e870346e62e1dd7e40d2a3534e7
BLAKE2b-256 923b0ce9dcf77e1c5adb0aef3fc423e656cfd81e2f160855eeaf50739f0429ca

See more details on using hashes here.

File details

Details for the file rag_quality_kit-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for rag_quality_kit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46a0b61162a9da136c8d502513f1ccc9ae4eefb6d833077c7aa7ee731b59ef3b
MD5 522142a7fa58d3aa0d2865e425f808b2
BLAKE2b-256 59d0fbe4099fd936448524dd8dda2cfcc4e93dee26af141156d611d15eaa6945

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page