RAG evaluation with exactly 6 metrics. 3 variables, 6 relationships, nothing more.

Project description

trieval

RAG evaluation with exactly 6 metrics. Nothing more.

Every RAG system has 3 variables: Q (Question), C (Context), A (Answer).

3 variables → 6 pairwise relationships → 6 metrics.

The 6 Metrics

#	Metric	Notation	Evaluates
1	Context Relevance	C\|Q	Is the retrieved context relevant to the question?
2	Faithfulness	A\|C	Does the answer stick to the context?
3	Answer Relevance	A\|Q	Does the answer solve the user's question?
4	Context Support	C\|A	Does the context fully support the answer?
5	Answerability	Q\|C	Can this question be answered with this context?
6	Self-Containment	Q\|A	Can someone understand the question from the answer?

Install

pip install trieval

Quick Start

from trieval import Evaluator

evaluator = Evaluator(model="openai:gpt-4o-mini")
result = await evaluator.evaluate(
    question="What is photosynthesis?",
    context="Photosynthesis is the process by which plants convert sunlight into energy.",
    answer="Photosynthesis is how plants make food from sunlight.",
)

print(result.overall_score)       # 0.0–1.0
print(result.retrieval_score)     # avg of C|Q + Q|C
print(result.generation_score)    # avg of A|C + A|Q
print(result.diagnose())          # ["All metrics healthy"] or failure categories

Failure Diagnosis

When your RAG system fails, it's always one of these:

Retrieval issues — C|Q and Q|C scores are low (wrong context retrieved)
Generation issues — A|C and A|Q scores are low (bad answer generation)
End-to-end mismatch — A|C is fine but C|A is low (faithful but unsupported)

Architecture

Built with pydantic-ai (LLM-based metric agents) and LangGraph (evaluation workflow orchestration).

RAGInput(Q, C, A)
    ↓
Evaluator.evaluate()
    ↓
LangGraph: evaluate_metrics → diagnose_failures
    ↓
EvaluationResult (scores + diagnosis)

Each metric is a pydantic-ai Agent with a tailored system prompt. All 6 run concurrently via asyncio.gather inside the LangGraph evaluation node.

Development

uv sync --group dev
pytest                                    # run tests
pytest --cov=trieval --cov-branch         # with coverage
ruff check trieval/ tests/                # lint
ruff format trieval/ tests/               # format
mypy trieval/                             # type check

Documentation

API Reference — Full API for Evaluator, EvaluationResult, MetricResult, RAGInput, and individual metric functions
Changelog — Version history

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.0.1

Apr 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trieval-0.0.1.tar.gz (10.6 kB view details)

Uploaded Apr 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trieval-0.0.1-py3-none-any.whl (12.3 kB view details)

Uploaded Apr 11, 2026 Python 3

File details

Details for the file trieval-0.0.1.tar.gz.

File metadata

Download URL: trieval-0.0.1.tar.gz
Upload date: Apr 11, 2026
Size: 10.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.2

File hashes

Hashes for trieval-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`73b54b26ff32a18cb935ddd51642746ee86b5a3bab3f93032e4b394bfb4a8e59`
MD5	`a49541d2cb863efb97a40282f4302d22`
BLAKE2b-256	`7139739d86c529dbbaeb102fb3935e8dffefc2d960556d8debfcf6d999e7e00e`

See more details on using hashes here.

File details

Details for the file trieval-0.0.1-py3-none-any.whl.

File metadata

Download URL: trieval-0.0.1-py3-none-any.whl
Upload date: Apr 11, 2026
Size: 12.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.2

File hashes

Hashes for trieval-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f0d56c3a325d195660680a25720b7699718eeb4b0035c118963f81b02847bdb`
MD5	`c65d5f1bec894c301d28b0c00b4da4bb`
BLAKE2b-256	`0d9e5878f374fefdb92a19d6294eb622c6d783505146b83f7e723183c4539e5c`

See more details on using hashes here.

trieval 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

trieval

The 6 Metrics

Install

Quick Start

Failure Diagnosis

Architecture

Development

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes