CI/CD-integrated RAG evaluation pipeline — quality gate for AI chatbots using Ragas + Groq LLM judge

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

manik24

These details have not been verified by PyPI

Project description

rag-eval

A CI/CD-integrated evaluation pipeline for RAG systems.

rag-eval acts as a quality gate for your RAG applications. It evaluates Pull Requests and can block merges if the output quality drops below defined thresholds.

How it works

When a pull request is opened, the Github Action:

Installs the rag-eval package.
Loads a golden evaluation dataset (from Hugging Face or a local file).
Runs the dataset through your Mock RAG pipeline.
Evaluates the outputs using Ragas metrics.
Checks scores against your defined thresholds in eval_config.yaml.
Pushes metrics to Grafana for trend tracking.
Posts a summary comment on the Pull Request.
Fails the CI job if any metric drops below the threshold.

Evaluation Metrics

Metric	What It Measures	Default Threshold
Faithfulness	Answers are grounded in retrieved context	≥ 0.75
Context Relevance	Retrieved context quality	≥ 0.70
Answer Correctness	Accuracy vs ground truth	≥ 0.65
Token Efficiency	`correctness / log(1 + tokens)`	≥ 0.50

The default LLM Judge is groq/llama-3.3-70b-versatile via LiteLLM.

Quick Start

# Install
pip install rag-eval-gate

# Set API key
export GROQ_API_KEY="your_api_key"

# Run evaluation
rag-eval run

# View report
rag-eval report

Try the Hallucination Demo 🚨

Want to see rag-eval catch a hallucinating AI in real-time? We built a cinematic terminal demo that intentionally forces our mock RAG pipeline to hallucinate an answer about "RLHF", proving that the quality gate works:

# Make sure GROQ_API_KEY is exported, then run:
python examples/demo.py

GitHub Actions Setup

Add this workflow to .github/workflows/rag_eval.yml:

name: RAG Evaluation
on: [pull_request]

jobs:
  eval:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install rag-eval-gate
      - run: rag-eval run --config eval_config.yaml
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}

Ensure you set GROQ_API_KEY in your GitHub repository secrets.

Configuration

You can customize the passing thresholds and dataset endpoints in eval_config.yaml:

thresholds:
  faithfulness_min: 0.75
  context_relevance_min: 0.70
  answer_correctness_min: 0.65
  token_efficiency_min: 0.50

dataset:
  hf_repo: "manikbodamwad/rag-eval-golden"

Local Development

git clone https://github.com/manikbodamwad/rag-eval
cd rag-eval
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

cp .env.example .env

# Run local evaluation
rag-eval run

# View formatted report
rag-eval report

# Run unit tests
python -m pytest tests/

Golden Dataset

The default test set is pushed to manikbodamwad/rag-eval-golden on Hugging Face. To use your own dataset, create a JSONL file with the following schema:

{"question": "What is X?", "ground_truth": "X is ...", "reference_context": "The passage that answers this..."}

Then specify the local path or your own HF repo in eval_config.yaml.

License

MIT License.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

manik24

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Jun 15, 2026

0.2.0

Jun 15, 2026

This version

0.1.0

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_eval_gate-0.1.0.tar.gz (20.6 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_eval_gate-0.1.0-py3-none-any.whl (17.0 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file rag_eval_gate-0.1.0.tar.gz.

File metadata

Download URL: rag_eval_gate-0.1.0.tar.gz
Upload date: Jun 15, 2026
Size: 20.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag_eval_gate-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a2d7211c656c7de7dacecea82c7bf02d9b7680eaf0082f9459a1952f520368f9`
MD5	`ddc93bca53cf50476c305d846bda714c`
BLAKE2b-256	`687932873846f45ef29d93d47de3fb8aa271a2901f1cac46979a240b899eae65`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag_eval_gate-0.1.0.tar.gz:

Publisher: publish.yml on ManikBodamwad/RAG-EVAL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rag_eval_gate-0.1.0.tar.gz
- Subject digest: a2d7211c656c7de7dacecea82c7bf02d9b7680eaf0082f9459a1952f520368f9
- Sigstore transparency entry: 1823798123
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: ManikBodamwad/RAG-EVAL@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ManikBodamwad
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277
- Trigger Event: push

File details

Details for the file rag_eval_gate-0.1.0-py3-none-any.whl.

File metadata

Download URL: rag_eval_gate-0.1.0-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 17.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rag_eval_gate-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbc0d2a4238ba885ebe37d0daee27c3ccb93e39998bb12aac1377a4d81c6fb84`
MD5	`21c9a22645c1cd54c39fc6652e3755fa`
BLAKE2b-256	`00aa669b6fa582c838d6b9017a7a8cdf45dac6085064dd4e3bbe052ad7dfb9f0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rag_eval_gate-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ManikBodamwad/RAG-EVAL

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rag_eval_gate-0.1.0-py3-none-any.whl
- Subject digest: cbc0d2a4238ba885ebe37d0daee27c3ccb93e39998bb12aac1377a4d81c6fb84
- Sigstore transparency entry: 1823798356
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: ManikBodamwad/RAG-EVAL@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/ManikBodamwad
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e1ac0e6ebd4c630b7484b4675b3db9b07b0ac277
- Trigger Event: push

rag-eval-gate 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

rag-eval

How it works

Evaluation Metrics

Quick Start

Try the Hallucination Demo 🚨

GitHub Actions Setup

Configuration

Local Development

Golden Dataset

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance