Skip to main content

RAG evaluation with exactly 6 metrics. 3 variables, 6 relationships, nothing more.

Project description

trieval

RAG evaluation with exactly 6 metrics. Nothing more.

Every RAG system has 3 variables: Q (Question), C (Context), A (Answer).

3 variables → 6 pairwise relationships → 6 metrics.

The 6 Metrics

# Metric Notation Evaluates
1 Context Relevance C|Q Is the retrieved context relevant to the question?
2 Faithfulness A|C Does the answer stick to the context?
3 Answer Relevance A|Q Does the answer solve the user's question?
4 Context Support C|A Does the context fully support the answer?
5 Answerability Q|C Can this question be answered with this context?
6 Self-Containment Q|A Can someone understand the question from the answer?

Install

pip install trieval

Quick Start

from trieval import Evaluator

evaluator = Evaluator(model="openai:gpt-4o-mini")
result = await evaluator.evaluate(
    question="What is photosynthesis?",
    context="Photosynthesis is the process by which plants convert sunlight into energy.",
    answer="Photosynthesis is how plants make food from sunlight.",
)

print(result.overall_score)       # 0.0–1.0
print(result.retrieval_score)     # avg of C|Q + Q|C
print(result.generation_score)    # avg of A|C + A|Q
print(result.diagnose())          # ["All metrics healthy"] or failure categories

Failure Diagnosis

When your RAG system fails, it's always one of these:

  • Retrieval issues — C|Q and Q|C scores are low (wrong context retrieved)
  • Generation issues — A|C and A|Q scores are low (bad answer generation)
  • End-to-end mismatch — A|C is fine but C|A is low (faithful but unsupported)

Architecture

Built with pydantic-ai (LLM-based metric agents) and LangGraph (evaluation workflow orchestration).

RAGInput(Q, C, A)
    ↓
Evaluator.evaluate()
    ↓
LangGraph: evaluate_metrics → diagnose_failures
    ↓
EvaluationResult (scores + diagnosis)

Each metric is a pydantic-ai Agent with a tailored system prompt. All 6 run concurrently via asyncio.gather inside the LangGraph evaluation node.

Development

uv sync --group dev
pytest                                    # run tests
pytest --cov=trieval --cov-branch         # with coverage
ruff check trieval/ tests/                # lint
ruff format trieval/ tests/               # format
mypy trieval/                             # type check

Documentation

  • API Reference — Full API for Evaluator, EvaluationResult, MetricResult, RAGInput, and individual metric functions
  • Changelog — Version history

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trieval-0.0.1.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trieval-0.0.1-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file trieval-0.0.1.tar.gz.

File metadata

  • Download URL: trieval-0.0.1.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for trieval-0.0.1.tar.gz
Algorithm Hash digest
SHA256 73b54b26ff32a18cb935ddd51642746ee86b5a3bab3f93032e4b394bfb4a8e59
MD5 a49541d2cb863efb97a40282f4302d22
BLAKE2b-256 7139739d86c529dbbaeb102fb3935e8dffefc2d960556d8debfcf6d999e7e00e

See more details on using hashes here.

File details

Details for the file trieval-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: trieval-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for trieval-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f0d56c3a325d195660680a25720b7699718eeb4b0035c118963f81b02847bdb
MD5 c65d5f1bec894c301d28b0c00b4da4bb
BLAKE2b-256 0d9e5878f374fefdb92a19d6294eb622c6d783505146b83f7e723183c4539e5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page