CI/CD-integrated RAG evaluation pipeline — quality gate for AI chatbots using Ragas + Groq LLM judge
Project description
rag-eval
A CI/CD-integrated evaluation pipeline for RAG systems.
rag-eval acts as a quality gate for your RAG applications. It evaluates Pull Requests and can block merges if the output quality drops below defined thresholds.
How it works
When a pull request is opened, the Github Action:
- Installs the
rag-evalpackage. - Loads a golden evaluation dataset (from Hugging Face or a local file).
- Runs the dataset through your Mock RAG pipeline.
- Evaluates the outputs using Ragas metrics.
- Checks scores against your defined thresholds in
eval_config.yaml. - Pushes metrics to Grafana for trend tracking.
- Posts a summary comment on the Pull Request.
- Fails the CI job if any metric drops below the threshold.
Evaluation Metrics
| Metric | What It Measures | Default Threshold |
|---|---|---|
| Faithfulness | Answers are grounded in retrieved context | ≥ 0.75 |
| Context Relevance | Retrieved context quality | ≥ 0.70 |
| Answer Correctness | Accuracy vs ground truth | ≥ 0.65 |
| Token Efficiency | correctness / log(1 + tokens) |
≥ 0.50 |
The default LLM Judge is groq/llama-3.3-70b-versatile via LiteLLM.
Quick Start
# Install
pip install rag-eval-gate
# Set API key
export GROQ_API_KEY="your_api_key"
# Run evaluation
rag-eval run
# View report
rag-eval report
Try the Hallucination Demo 🚨
Want to see rag-eval catch a hallucinating AI in real-time? We built a cinematic terminal demo that intentionally forces our mock RAG pipeline to hallucinate an answer about "RLHF", proving that the quality gate works:
# Make sure GROQ_API_KEY is exported, then run:
python examples/demo.py
GitHub Actions Setup
Add this workflow to .github/workflows/rag_eval.yml:
name: RAG Evaluation
on: [pull_request]
jobs:
eval:
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.11" }
- run: pip install rag-eval-gate
- run: rag-eval run --config eval_config.yaml
env:
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
Ensure you set GROQ_API_KEY in your GitHub repository secrets.
Configuration
You can customize the passing thresholds and dataset endpoints in eval_config.yaml:
thresholds:
faithfulness_min: 0.75
context_relevance_min: 0.70
answer_correctness_min: 0.65
token_efficiency_min: 0.50
dataset:
hf_repo: "manikbodamwad/rag-eval-golden"
Local Development
git clone https://github.com/manikbodamwad/rag-eval
cd rag-eval
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Run local evaluation
rag-eval run
# View formatted report
rag-eval report
# Run unit tests
python -m pytest tests/
Golden Dataset
The default test set is pushed to manikbodamwad/rag-eval-golden on Hugging Face. To use your own dataset, create a JSONL file with the following schema:
{"question": "What is X?", "ground_truth": "X is ...", "reference_context": "The passage that answers this..."}
Then specify the local path or your own HF repo in eval_config.yaml.
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_eval_gate-0.1.0.tar.gz.
File metadata
- Download URL: rag_eval_gate-0.1.0.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2d7211c656c7de7dacecea82c7bf02d9b7680eaf0082f9459a1952f520368f9
|
|
| MD5 |
ddc93bca53cf50476c305d846bda714c
|
|
| BLAKE2b-256 |
687932873846f45ef29d93d47de3fb8aa271a2901f1cac46979a240b899eae65
|
Provenance
The following attestation bundles were made for rag_eval_gate-0.1.0.tar.gz:
Publisher:
publish.yml on ManikBodamwad/RAG-EVAL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rag_eval_gate-0.1.0.tar.gz -
Subject digest:
a2d7211c656c7de7dacecea82c7bf02d9b7680eaf0082f9459a1952f520368f9 - Sigstore transparency entry: 1823798123
- Sigstore integration time:
-
Permalink:
ManikBodamwad/RAG-EVAL@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ManikBodamwad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rag_eval_gate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rag_eval_gate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbc0d2a4238ba885ebe37d0daee27c3ccb93e39998bb12aac1377a4d81c6fb84
|
|
| MD5 |
21c9a22645c1cd54c39fc6652e3755fa
|
|
| BLAKE2b-256 |
00aa669b6fa582c838d6b9017a7a8cdf45dac6085064dd4e3bbe052ad7dfb9f0
|
Provenance
The following attestation bundles were made for rag_eval_gate-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on ManikBodamwad/RAG-EVAL
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rag_eval_gate-0.1.0-py3-none-any.whl -
Subject digest:
cbc0d2a4238ba885ebe37d0daee27c3ccb93e39998bb12aac1377a4d81c6fb84 - Sigstore transparency entry: 1823798356
- Sigstore integration time:
-
Permalink:
ManikBodamwad/RAG-EVAL@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ManikBodamwad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277 -
Trigger Event:
push
-
Statement type: