A Multiagent Framework for Generating Multimodal Multihop QA Datasets for RAG Evaluation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chpk

These details have not been verified by PyPI

Project description

MiRAGE: Multimodal Multihop RAG Evaluation Dataset Generator

Python 3.9+ License PyPI

MiRAGE is a multi-agent framework for generating high-quality, multimodal, multihop question-answer datasets for evaluating Retrieval-Augmented Generation (RAG) systems. It automatically extracts domain expertise, builds complete context through iterative retrieval, and generates verified QA pairs from technical documents.

MiRAGE Framework Architecture

Key Features

Multi-hop Context Completion: Iteratively expands incomplete chunks with relevant context across documents
Domain and Expert Role Detection: Automatic domain identification using BERTopic + LLM
Multi-stage QA Pipeline: Generate, Select, Verify, Correct for quality assurance
Multimodal Support: Handles text, tables, figures, and images in documents
Cross-Document Retrieval: Unified FAISS index enables retrieval across all documents
Hierarchical Deduplication: Two-stage clustering with LLM-based merging
Multiple Backend Support: Gemini, OpenAI, and local Ollama models
Optimized Evaluation: 3-5x faster metrics with harmonized RAGAS implementation
Fully Parallelized: Thread and process pools for maximum throughput

Installation
Quick Start
Project Structure
Configuration
Pipeline Overview
Usage
Output Format
Evaluation Metrics
Hyperparameter Guide
API Keys Setup
Contributing
Citation
License

Installation

From PyPI (Recommended)

pip install mirage-benchmark

From Source (Development)

# Clone the repository
git clone https://github.com/ChandanKSahu/MiRAGE.git
cd MiRAGE

# Install in development mode
pip install -e .

# Or install with all optional dependencies
pip install -e ".[all]"

With Optional Dependencies

# GPU support (CUDA-enabled embeddings and FAISS)
pip install mirage-benchmark[gpu]

# PDF processing (Docling for PDF to Markdown conversion)
pip install mirage-benchmark[pdf]

# Evaluation metrics (RAGAS and LangChain)
pip install mirage-benchmark[eval]

# Development tools (testing, linting)
pip install mirage-benchmark[dev]

# All dependencies
pip install mirage-benchmark[all]

Quick Start

1. Clone and Install

git clone https://github.com/ChandanKSahu/MiRAGE.git
cd MiRAGE
pip install -e .

2. Add Your Documents

Place your PDF, HTML, or other documents in the data/documents/ folder:

# The folder structure should look like:
data/
└── documents/
    ├── document1.pdf
    ├── document2.pdf
    └── ...

A sample dataset (data/FinanceAnnualReports.zip) is included for testing.

3. Configure API Keys

Create your configuration file:

cp config.yaml.example config.yaml

Edit config.yaml to add your API keys:

backend:
  active: GEMINI  # Options: GEMINI, OPENAI, OLLAMA
  
  gemini:
    api_key_path: ~/.config/gemini/api_key.txt
    # Or use environment variable: export GEMINI_API_KEY="your-key"
    
  openai:
    api_key_path: ~/.config/openai/api_key.txt
    # Or use environment variable: export OPENAI_API_KEY="your-key"
    
  ollama:
    base_url: http://localhost:11434
    # No API key needed for local Ollama

paths:
  input_pdf_dir: data/documents
  output_dir: output/my_dataset

4. Run Preflight Checks

python run_mirage.py --preflight

5. Generate QA Dataset

python run_mirage.py

Project Structure

MiRAGE/
├── src/mirage/                 # Main package
│   ├── __init__.py            # Package exports
│   ├── cli.py                 # Command line interface
│   ├── core/                  # Core functionality
│   │   ├── llm.py            # LLM/VLM API interfaces
│   │   ├── prompts.py        # Prompt templates
│   │   └── config.py         # Configuration management
│   ├── embeddings/            # Embedding models
│   │   ├── models.py         # Embedding model classes
│   │   ├── rerankers_multimodal.py
│   │   └── rerankers_text.py
│   ├── pipeline/              # Processing pipeline
│   │   ├── pdf_processor.py  # PDF to Markdown
│   │   ├── chunker.py        # Semantic chunking
│   │   ├── context.py        # Multi-hop retrieval
│   │   ├── qa_generator.py   # QA generation
│   │   ├── domain.py         # Domain extraction
│   │   └── deduplication.py  # Deduplication
│   ├── evaluation/            # Metrics
│   │   ├── metrics.py
│   │   └── metrics_optimized.py
│   └── utils/                 # Utilities
│       ├── preflight.py      # Preflight checks
│       ├── stats.py          # Dataset statistics
│       └── ablation.py       # Ablation studies
├── data/                      # Your documents go here
│   └── documents/            # Input PDFs/HTMLs
├── output/                    # Generated results
├── assets/                    # Documentation images
├── config.yaml.example        # Example configuration
├── run_mirage.py             # Main entry point
├── setup.py                   # Package setup
└── README.md

Configuration

MiRAGE uses a YAML configuration file. Key sections:

Section	Description
`backend`	LLM/VLM provider settings (Gemini, OpenAI, Ollama)
`paths`	Input documents and output directory
`qa_generation`	Target QA pairs and type (multihop/multimodal/text)
`embedding`	Embedding model and batch size
`retrieval`	Multi-hop retrieval parameters
`deduplication`	Similarity thresholds for deduplication
`evaluation`	Metrics and evaluation settings

See config.yaml.example for full documentation.

API Keys Setup

Google Gemini

# Option 1: Environment variable
export GEMINI_API_KEY="your-gemini-api-key"

# Option 2: File (create the directory first)
mkdir -p ~/.config/gemini
echo "your-gemini-api-key" > ~/.config/gemini/api_key.txt

OpenAI

# Option 1: Environment variable
export OPENAI_API_KEY="your-openai-api-key"

# Option 2: File
mkdir -p ~/.config/openai
echo "your-openai-api-key" > ~/.config/openai/api_key.txt

Ollama (Local)

No API key needed. Just install and start Ollama:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull models
ollama pull llama3
ollama pull llava

# Ollama runs on http://localhost:11434 by default

Pipeline Overview

The MiRAGE framework operates through a multi-stage pipeline:

+------------------------------------------------------------------+
|  STEP 1: Document Processing                                      |
|  PDF/HTML -> Markdown -> Semantic Chunks                          |
+--------------------------------+---------------------------------+
                                 |
                                 v
+------------------------------------------------------------------+
|  STEP 2: Embedding and Indexing                                   |
|  Embed all chunks -> Build unified FAISS index                    |
+--------------------------------+---------------------------------+
                                 |
                                 v
+------------------------------------------------------------------+
|  STEP 3: Domain and Expert Extraction                             |
|  BERTopic analysis -> LLM domain/role identification              |
+--------------------------------+---------------------------------+
                                 |
                                 v
+------------------------------------------------------------------+
|  STEP 4: QA Generation (per chunk, parallel)                      |
|  +--------------------------------------------------------------+ |
|  | 4.1 Verify chunk completeness                                | |
|  | 4.2 Multi-hop retrieval for incomplete chunks                | |
|  | 4.3 Generate QA pairs from complete context                  | |
|  | 4.4 Select high-quality pairs                                | |
|  | 4.5 Verify correctness and context necessity                 | |
|  | 4.6 Correct failed pairs (optional)                          | |
|  +--------------------------------------------------------------+ |
+--------------------------------+---------------------------------+
                                 |
                                 v
+------------------------------------------------------------------+
|  STEP 5: Hierarchical Deduplication                               |
|  Question clustering -> Answer sub-clustering -> LLM merging      |
+--------------------------------+---------------------------------+
                                 |
                                 v
+------------------------------------------------------------------+
|  STEP 6: Evaluation                                               |
|  RAGAS metrics + Custom metrics (faithfulness, relevancy, etc)    |
+------------------------------------------------------------------+

Usage

Full Pipeline

# Using the entry script
python run_mirage.py

# Or using the CLI (after pip install -e .)
mirage --config config.yaml

Individual Components

# Preflight checks only
python run_mirage.py --preflight

# With custom config
python run_mirage.py --config my_config.yaml

# Skip preflight checks
python run_mirage.py --skip-preflight

# Verbose output
python run_mirage.py --verbose

Programmatic Usage

from mirage.core import call_llm_simple, setup_logging
from mirage.pipeline import build_complete_context, generate_qa_for_chunk
from mirage.embeddings import get_best_embedding_model

# Setup logging
setup_logging()

# Load embedding model
embedder = get_best_embedding_model()

# Generate QA for a chunk
qa_pairs = generate_qa_for_chunk(chunk, domain="Finance", expert="Financial Analyst")

Output Format

Sample Generated Question-Answer Pair

Sample Generated QA Pair

QA Dataset Structure (qa_deduplicated.json)

[
  {
    "chunk_id": 1,
    "question": "What efficiency must a 75kW IE4 motor achieve?",
    "answer": "A 75kW IE4 motor must achieve 96.0% efficiency at 50Hz...",
    "context_chunks": [...],
    "hop_count": 2,
    "relevance_score": "9",
    "difficulty_score": "7",
    "expert_persona": "Motor Design Engineer",
    "domain": "Electrical Engineering"
  }
]

Output Directory Structure

output/my_dataset/
├── markdown/              # Converted markdown files
├── chunks.json           # Semantic chunks
├── embeddings/           # FAISS index and embeddings
├── qa_dataset.json       # Raw QA pairs
├── qa_deduplicated.json  # Final deduplicated QA pairs
└── evaluation_report.json # Metrics and statistics

Evaluation Metrics

Metric	Description
Faithfulness	Answer grounded in context
Answer Relevancy	Answer addresses the question
Context Precision	Retrieved chunks are relevant
Context Recall	Context contains reference info
Multi-hop Reasoning	Quality of multi-step reasoning
Visual Dependency	Requires image to answer
Context Necessity	Requires context (anti-parametric bias)
Domain Coverage	Corpus coverage

Hyperparameter Guide

Parameter	Default	Description
`max_depth`	10	Maximum retrieval iterations
`max_breadth`	5	Search queries per iteration
`chunks_per_search`	2	Chunks retrieved per query
`qa_max_workers`	6	Parallel workers for QA gen
`question_similarity_threshold`	0.75	Question clustering threshold

Recommended Settings

Use Case	max_depth	max_breadth	chunks_per_search
Quick Testing	2	2	1
Balanced (Default)	10	5	2
Thorough	20	10	3

Contributing

Contributions are welcome! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Citation

If you use MiRAGE in your research, please cite:

@software{mirage2024,
  title = {MiRAGE: A Multiagent Framework for Generating Multimodal Multihop QA Datasets for RAG Evaluation},
  author = {MiRAGE Authors},
  year = {2024},
  url = {https://github.com/ChandanKSahu/MiRAGE}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

RAGAS for evaluation metrics
BERTopic for topic modeling
Sentence Transformers for embeddings
FAISS for similarity search
Docling for PDF processing

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chpk

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.0

Feb 9, 2026

1.4.0

Feb 9, 2026

1.3.1

Feb 9, 2026

1.3.0

Feb 9, 2026

1.2.7

Feb 8, 2026

1.2.6

Jan 29, 2026

1.2.5

Jan 28, 2026

1.2.4

Jan 28, 2026

1.2.3

Jan 28, 2026

1.2.2

Jan 28, 2026

1.2.1

Jan 28, 2026

1.2.0

Jan 21, 2026

1.0.6

Jan 14, 2026

1.0.5

Jan 6, 2026

1.0.4

Jan 6, 2026

1.0.3

Jan 6, 2026

1.0.2

Jan 6, 2026

This version

1.0.0

Jan 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mirage_benchmark-1.0.0.tar.gz (1.6 MB view details)

Uploaded Jan 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mirage_benchmark-1.0.0-py3-none-any.whl (149.9 kB view details)

Uploaded Jan 6, 2026 Python 3

File details

Details for the file mirage_benchmark-1.0.0.tar.gz.

File metadata

Download URL: mirage_benchmark-1.0.0.tar.gz
Upload date: Jan 6, 2026
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mirage_benchmark-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`8730b9607ddeeb1b41062c1bbcce2ed2847fafb0c6995d05ed344f3cd40eb9bc`
MD5	`5b5625ffeb5f7d56672d603d018ca26f`
BLAKE2b-256	`02ce2f9649af8a8ae449f5571e8e6afe086982ae0e297fb2c97ae771683a8ea5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mirage_benchmark-1.0.0.tar.gz:

Publisher: publish-pypi.yml on ChandanKSahu/MiRAGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mirage_benchmark-1.0.0.tar.gz
- Subject digest: 8730b9607ddeeb1b41062c1bbcce2ed2847fafb0c6995d05ed344f3cd40eb9bc
- Sigstore transparency entry: 797303766
- Sigstore integration time: Jan 6, 2026
Source repository:
- Permalink: ChandanKSahu/MiRAGE@f5d8739ed9cc353c590a1875022faf5625cf9a26
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/ChandanKSahu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@f5d8739ed9cc353c590a1875022faf5625cf9a26
- Trigger Event: release

File details

Details for the file mirage_benchmark-1.0.0-py3-none-any.whl.

File metadata

Download URL: mirage_benchmark-1.0.0-py3-none-any.whl
Upload date: Jan 6, 2026
Size: 149.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mirage_benchmark-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`02ffc1b1a8895ae8a98946069698bd3e4cac41abc99d3ed2c0a1b9bb3694ecf4`
MD5	`259015a578a8a7c5d8ed86fdc9de741f`
BLAKE2b-256	`4cf77cbc08da327ac14df0de83ffc62382fca67da31c34eda09ea4ba8d7fb286`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mirage_benchmark-1.0.0-py3-none-any.whl:

Publisher: publish-pypi.yml on ChandanKSahu/MiRAGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mirage_benchmark-1.0.0-py3-none-any.whl
- Subject digest: 02ffc1b1a8895ae8a98946069698bd3e4cac41abc99d3ed2c0a1b9bb3694ecf4
- Sigstore transparency entry: 797303795
- Sigstore integration time: Jan 6, 2026
Source repository:
- Permalink: ChandanKSahu/MiRAGE@f5d8739ed9cc353c590a1875022faf5625cf9a26
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/ChandanKSahu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@f5d8739ed9cc353c590a1875022faf5625cf9a26
- Trigger Event: release

mirage-benchmark 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MiRAGE: Multimodal Multihop RAG Evaluation Dataset Generator

Key Features

Table of Contents

Installation

From PyPI (Recommended)

From Source (Development)

With Optional Dependencies

Quick Start

1. Clone and Install

2. Add Your Documents

3. Configure API Keys

4. Run Preflight Checks

5. Generate QA Dataset

Project Structure

Configuration

API Keys Setup

Google Gemini

OpenAI

Ollama (Local)

Pipeline Overview

Usage

Full Pipeline

Individual Components

Programmatic Usage

Output Format

Sample Generated Question-Answer Pair

QA Dataset Structure (qa_deduplicated.json)

Output Directory Structure

Evaluation Metrics

Hyperparameter Guide

Recommended Settings

Contributing

Citation

License

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance