Skip to main content

Counterfactual evaluation toolkit for RAG systems

Project description

EvalKit

EvalKit is a lightweight evaluation framework for RAG (Retrieval-Augmented Generation) systems based on counterfactual retrieval analysis.

It answers a simple but powerful question:

Which retrieved chunks actually mattered for generating the final answer?


🧠 Why EvalKit?

Most RAG evaluation tools focus on:

  • retrieval relevance
  • answer similarity
  • final response quality

But they miss a critical question:

If I remove a retrieved chunk, does the answer still hold?

EvalKit solves this using counterfactual evaluation.


🔬 Core Idea

For each retrieved context chunk:

  1. Remove the chunk
  2. Ask an LLM judge if the answer is still possible
  3. If YES → chunk is redundant
  4. If NO → chunk is required

This produces:

  • Retrieval Efficiency Score
  • Required Contexts
  • Redundant Contexts
  • Importance Scores per chunk

⚡ Installation

pip install evalkit

Why EvalKit

Tool Focus Missing insight
Ragas retrieval + answer score context necessity
DeepEval LLM eval benchmarks counterfactuals
EvalKit context necessity (WHY)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalkit_rag-0.1.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalkit_rag-0.1.0-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file evalkit_rag-0.1.0.tar.gz.

File metadata

  • Download URL: evalkit_rag-0.1.0.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for evalkit_rag-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7d3717b8422122c6e385f57a1d93934013ca0c9f06c3f5309eb000bff1f66e35
MD5 6cc32fdffbf49ab6b7ec3c4837d5537d
BLAKE2b-256 5315cecbb56020e555ff6ea433fb0c41d9f71dd575f1b164018e07698700ccaf

See more details on using hashes here.

File details

Details for the file evalkit_rag-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: evalkit_rag-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for evalkit_rag-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 026f79769eb57eca9c31df821f0a521fe2af1556e29e282fcfb0374b8260f779
MD5 295e2df66d3fbff1ca2f45fde3645d23
BLAKE2b-256 10fd0fd1765cd1b05c05993cabefed9728b931cc0efef92f6d89b232190952ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page