A Multiagent Framework for Generating Multimodal Multihop QA Datasets for RAG Evaluation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chpk

These details have not been verified by PyPI

Project description

MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation

Python 3.9+ License PyPI

MiRAGE is a multi-agent framework for generating high-quality, multimodal, multihop question-answer datasets for evaluating Retrieval-Augmented Generation (RAG) systems.

Multiagent Architecture

MiRAGE Framework Architecture

Sample QA Pair

Sample QA Pair Generated

Interactive Process Flow

Explore the step-by-step multihop QA generation process:

View Interactive Visualization

Key Features

Multi-hop Context Completion: Iteratively expands incomplete chunks with relevant context.
Domain and Expert Role Detection: Automatic domain identification using BERTopic + LLM
Multi-stage QA Pipeline: Generate, Select, Verify, Correct for quality assurance
Multimodal Support: Handles text, tables, figures, and images
Multiple Backend Support: Gemini, OpenAI, and local Ollama models
Fully Parallelized: Thread and process pools for maximum throughput
Token Usage Tracking: Automatic tracking of input/output tokens across all LLM calls
Checkpoint & Resume: Interrupt and resume long-running pipelines without losing progress

Installation
Quick Start
Usage
API Keys Setup
Configuration
Command Line Options
Output Format
Project Structure
Contributing
License

Installation

From PyPI

pip install mirage-benchmark

From Source

git clone https://github.com/ChandanKSahu/MiRAGE.git
cd MiRAGE
pip install -e .

With Optional Dependencies

pip install mirage-benchmark[eval]  # Evaluation metrics (ragas, langchain)
pip install mirage-benchmark[all]   # All optional dependencies

Note: As of v1.2.7, all core dependencies (PDF processing, embeddings, OCR, visualization) are included in the base install. Only evaluation metrics (ragas, langchain) are optional.

GPU Support (FAISS-GPU)

For GPU-accelerated similarity search, install FAISS-GPU via conda:

# Create conda environment (recommended)
conda create -n mirage python=3.11
conda activate mirage

# Install FAISS-GPU
conda install -c pytorch faiss-gpu

# Then install MiRAGE
pip install mirage-benchmark

Quick Start

Step 1: Install the Package

pip install mirage-benchmark

Step 2: Set Up API Key

Choose one of the following backends:

Option A: Google Gemini (Recommended)

export GEMINI_API_KEY="your-gemini-api-key"

Option B: OpenAI

export OPENAI_API_KEY="your-openai-api-key"

Option C: Local Ollama (No API key needed)

# Install and start Ollama
ollama serve
ollama pull llama3

Step 3: Prepare Your Data

Place your documents in a folder:

mkdir -p data/my_documents
cp /path/to/your/*.pdf data/my_documents/

Step 4: Run MiRAGE

After pip installation, use the run_mirage command:

# Basic usage with Gemini (default backend) - API key from environment
export GEMINI_API_KEY="your-gemini-key"
run_mirage --input data/my_documents --output output/my_dataset --num-qa-pairs 1

# Using Gemini with API key as argument
run_mirage -i data/my_documents -o output/my_dataset --backend gemini --api-key YOUR_GEMINI_KEY

# Using OpenAI
run_mirage -i data/my_documents -o output/my_dataset --backend openai --api-key YOUR_OPENAI_KEY

# Using local Ollama (no API key needed)
run_mirage -i data/my_documents -o output/my_dataset --backend ollama

Note: When using --api-key, always specify --backend to indicate which service the key is for.

Step 5: Check Results

ls output/my_dataset/
# qa_multihop_pass.json  - Generated QA pairs (always created)
# chunks.json            - Semantic chunks (always created)
# multihop_visualization.html - Interactive visualization (always created)
# embeddings/            - FAISS index and embeddings

# Optional outputs (if --deduplication and --evaluation flags used):
# qa_deduplicated.json   - Deduplicated QA pairs (with --deduplication)
# evaluation_report.json - Quality metrics (with --evaluation)

Quick Test

# Verify installation
run_mirage --version

# Run preflight checks
run_mirage --preflight

# Generate 1 QA pair for testing
run_mirage --input data/sample --output results/test --num-qa-pairs 1

Usage

Basic Usage (QA Generation Only)

By default, MiRAGE runs the core pipeline: document processing, chunking, embedding, and QA generation/verification. Deduplication and evaluation are OFF by default.

# Default: Generates QA pairs without deduplication or evaluation
run_mirage --input <INPUT_DIR> --output <OUTPUT_DIR> --num-qa-pairs 100

With Deduplication

To merge similar QA pairs and remove duplicates:

run_mirage -i data/documents -o output/results --num-qa-pairs 100 --deduplication

With Evaluation Metrics

To compute quality metrics (faithfulness, relevancy, etc.):

run_mirage -i data/documents -o output/results --num-qa-pairs 100 --evaluation

Full Pipeline (Deduplication + Evaluation)

run_mirage -i data/documents -o output/results --num-qa-pairs 100 --deduplication --evaluation

With All Options

run_mirage \
    --input data/documents \
    --output output/results \
    --backend gemini \
    --api-key YOUR_GEMINI_KEY \
    --num-qa-pairs 100 \
    --max-workers 4 \
    --max-depth 2 \
    --embedding-model auto \
    --reranker-model gemini_vlm \
    --deduplication \
    --evaluation \
    --verbose

Auto-Selected Reranker

The reranker is automatically selected based on your backend/API keys:

Gemini backend/key -> Uses Gemini VLM reranker (fast, API-based, uses same model as VLM config)
OpenAI backend -> Uses Gemini VLM if Gemini key available, else MonoVLM
No API keys -> Falls back to MonoVLM (local model, slower)

You can override with --reranker-model flag (options: gemini_vlm, monovlm, text_embedding).

Backend Options:

gemini (default) - Requires GEMINI_API_KEY or --api-key
openai - Requires OPENAI_API_KEY or --api-key
ollama - No API key needed (runs locally)

Pipeline Steps:

Step	Description	Default
1. Document Processing	PDF/HTML to Markdown	Mandatory
2. Chunking	Semantic chunking	Mandatory
3. Embedding	FAISS index creation	Mandatory
4. Domain Detection	Expert persona extraction	Mandatory
5. QA Generation	Multi-hop QA with verification	Mandatory
6. Deduplication	Merge similar QA pairs	OFF (use `--deduplication`)
7. Evaluation	Quality metrics	OFF (use `--evaluation`)

Run Preflight Checks

Before running the full pipeline, verify your setup:

run_mirage --preflight

Using Sample Dataset

# Prepare sample data (if you have it)
mkdir -p data/sample
cp /path/to/your/documents/*.pdf data/sample/

# Run on sample
run_mirage -i data/sample -o output/sample_results --num-qa-pairs 10

API Keys Setup

Google Gemini

Get API key from: https://makersuite.google.com/app/apikey
Set environment variable:

export GEMINI_API_KEY="your-key-here"

Or create a file:

mkdir -p ~/.config/gemini
echo "your-key-here" > ~/.config/gemini/api_key.txt

OpenAI

Get API key from: https://platform.openai.com/api-keys
Set environment variable:

export OPENAI_API_KEY="your-key-here"

Ollama (Local - Free)

No API key needed! Just install Ollama:

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Start server
ollama serve

# Pull models
ollama pull llama3      # For text
ollama pull llava       # For vision

Configuration

Using config.yaml

Copy the example config and customize:

cp config.yaml.example config.yaml

Edit config.yaml:

backend:
  active: GEMINI  # GEMINI, OPENAI, or OLLAMA
  
  gemini:
    api_key_path: ~/.config/gemini/api_key.txt
    llm_model: gemini-2.0-flash
    vlm_model: gemini-2.0-flash
    
  openai:
    api_key_path: ~/.config/openai/api_key.txt
    llm_model: gpt-4o
    vlm_model: gpt-4o
    
  ollama:
    base_url: http://localhost:11434
    llm_model: llama3
    vlm_model: llava

paths:
  input_pdf_dir: data/documents
  output_dir: output/results

qa_generation:
  target_qa_pairs: 100
  max_workers: 4

Then run:

run_mirage --config config.yaml --input data/documents --output output/results

Note: When installing from pip, you can still use a custom config.yaml file. Place it in your working directory or specify the path with --config.

Cost Optimization

MiRAGE uses LLM/VLM APIs extensively. Two operations consume the most tokens:

1. Document Processing (PDF/HTML -> Markdown -> Chunks)

Cost: High (processes every page with VLM for image/table extraction)

Recommendation:

Only process documents once on a curated set of relevant files
Use --skip-pdf-processing and --skip-chunking on subsequent runs
Pre-filter documents to remove irrelevant content before running MiRAGE

# First run: Process and chunk documents
run_mirage -i data/documents -o output/results --num-qa-pairs 100

# Subsequent runs: Skip processing, only generate QA
run_mirage -i data/documents -o output/results --skip-pdf-processing --skip-chunking --num-qa-pairs 100

2. Multi-hop Context Building

Cost: High (recursive LLM calls to expand context at each depth level)

Recommendation:

Default is now max_depth: 2 (previously 5)
Higher depths exponentially increase token usage with diminishing returns
Depth 2 captures most meaningful cross-document relationships

# config.yaml
context:
  max_depth: 2  # Recommended: 2 (default: 5)

Use print_token_stats() or check the pipeline summary to monitor actual token consumption.

Command Line Options

Option	Short	Description	Default
`--input`	`-i`	Input directory with documents	Required
`--output`	`-o`	Output directory for results	Required
`--api-key`	`-k`	API key for LLM backend	From env
`--backend`	`-b`	Backend: gemini, openai, ollama	gemini
`--model`		Model name	Auto
`--config`	`-c`	Config file path	config.yaml
`--init-config`		Generate a config.yaml in current directory	-
`--num-qa-pairs`		Target QA pairs to generate	10
`--max-depth`		Maximum depth for multi-hop retrieval	2
`--embedding-model`		Embedding model: `auto`, `qwen3_vl`, `nomic`, `bge_m3`	auto
`--reranker-model`		Reranker model: `gemini_vlm`, `monovlm`, `text_embedding`	auto (based on backend)
`--max-workers`		Parallel workers	4
`--preflight`		Run preflight checks only	-
`--skip-preflight`		Skip preflight checks	-
`--skip-pdf-processing`		Skip PDF conversion	-
`--skip-chunking`		Skip chunking step	-
`--verbose`	`-v`	Verbose output	-
`--version`		Show version	-
`--help`	`-h`	Show help	-

Multihop QA Visualization

Explore an interactive visualization of the multihop QA generation process, showing how context chunks are linked through keywords to generate complex questions:

View Interactive Multihop QA Visualization

The visualization demonstrates:

Context chunk retrieval and keyword extraction
Keyword chain relationships across chunks
Iterative retrieval depth progression
Final question-answer generation with highlighted concepts

Output Format

Generated Files

output/my_dataset/
├── markdown/              # Converted markdown files
├── chunks.json           # Semantic chunks
├── qa_dataset.json       # Raw QA pairs
├── qa_deduplicated.json  # Final deduplicated QA pairs
├── evaluation_report.json # Quality metrics
└── run_config.json       # Run configuration

QA Dataset Structure

{
  "chunk_id": 1,
  "question": "What is the company's revenue growth?",
  "answer": "The company achieved 15% revenue growth...",
  "context_chunks": [...],
  "hop_count": 2,
  "relevance_score": "9",
  "difficulty_score": "7",
  "expert_persona": "Financial Analyst",
  "domain": "Finance"
}

Multihop QA Visualization

See the Interactive Process Flow at the top of this page for a step-by-step visualization showing:

Context chunk retrieval and keyword extraction
Keyword chain relationships across chunks
Iterative retrieval depth progression
Final question-answer generation with highlighted concepts

Project Structure

MiRAGE/
├── src/mirage/                    # Main package
│   ├── __init__.py               # Package initialization
│   ├── main.py                   # Pipeline orchestration
│   ├── cli.py                    # Command-line interface
│   ├── core/                     # Core functionality
│   │   ├── config.py             # Configuration management
│   │   ├── llm.py                # LLM/VLM API interfaces + token tracking
│   │   └── prompts.py            # Prompt templates
│   ├── embeddings/               # Embedding models
│   │   ├── models.py             # Embedding model selection
│   │   ├── rerankers_multimodal.py  # VLM-based reranking
│   │   └── rerankers_text.py     # Text-based reranking
│   ├── pipeline/                 # Processing pipeline
│   │   ├── pdf_processor.py      # PDF to Markdown conversion
│   │   ├── chunker.py            # Semantic chunking
│   │   ├── context.py            # Multi-hop context retrieval
│   │   ├── qa_generator.py       # QA generation and verification
│   │   ├── domain.py             # Domain/expert extraction
│   │   └── deduplication.py      # QA deduplication
│   ├── evaluation/               # Evaluation metrics
│   │   ├── metrics.py            # Standard RAGAS metrics
│   │   └── metrics_optimized.py  # Optimized metrics (faster)
│   └── utils/                    # Utilities
│       ├── preflight.py          # System checks
│       ├── stats.py              # Dataset statistics
│       ├── ablation.py           # Ablation studies
│       ├── checkpoint.py         # Checkpoint/resume support
│       ├── llm_cache.py          # LLM response caching
│       ├── visualize_multihop.py # Multihop QA visualization
│       └── visualize_pipeline.py # Pipeline flow visualization
├── data/documents/               # Input documents folder
├── output/                       # Generated results
├── assets/                       # Documentation images
├── config.yaml.example           # Example configuration
├── run_mirage.py                 # Main entry point script
├── setup.py                      # Package installation
├── pyproject.toml                # Package configuration
├── requirements.txt              # Dependencies
├── README.md                     # This file
├── CONTRIBUTING.md               # Contribution guidelines
└── LICENSE                       # Apache 2.0 License

Python API

For programmatic access, you can import and use MiRAGE modules directly:

# Import the main pipeline
from mirage import run_pipeline
# Or import specific components
from mirage.core.llm import call_llm_simple, call_vlm_interweaved
from mirage.pipeline.context import build_complete_context
from mirage.pipeline.qa_generator import generate_qa, verify_qa
from mirage.pipeline.domain import fetch_domain_and_role
from mirage.embeddings.models import NomicVLEmbed, get_best_embedding_model
from mirage.utils.preflight import run_preflight_checks

# Example: Run preflight checks
success, results = run_preflight_checks()

# Example: Call LLM
response = call_llm_simple("What is 2+2?")

# Example: Use embedding model
embedder = NomicVLEmbed()
embedding = embedder.encode("Sample text")

# Example: Track token usage
from mirage.core.llm import get_token_stats, print_token_stats, reset_token_stats

# After running LLM calls, check token usage
stats = get_token_stats()
print(f"Input tokens: {stats['total_input_tokens']}")
print(f"Output tokens: {stats['total_output_tokens']}")

# Print formatted summary
print_token_stats()

# Reset counters for a new run
reset_token_stats()

See the module docstrings for detailed API documentation.

Examples

Generate QA from PDFs

# Using Gemini
export GEMINI_API_KEY="your-key"
run_mirage -i data/pdfs -o output/qa_dataset --num-qa-pairs 100

# Using OpenAI  
export OPENAI_API_KEY="your-key"
run_mirage -i data/pdfs -o output/qa_dataset --backend openai --num-qa-pairs 100

# Using Ollama (local, free)
run_mirage -i data/pdfs -o output/qa_dataset --backend ollama --num-qa-pairs 100

Generate More QA Pairs

run_mirage -i data/documents -o output/large_dataset --num-qa-pairs 500

Use More Workers

run_mirage -i data/documents -o output/fast_run --max-workers 8 --num-qa-pairs 100

Skip Already Processed Steps

# If you already have markdown files
run_mirage -i data/documents -o output/results --skip-pdf-processing --num-qa-pairs 100

# If you already have chunks
run_mirage -i data/documents -o output/results --skip-chunking --num-qa-pairs 100

Custom Models

# Use specific embedding model
run_mirage -i data/documents -o output/results \
  --embedding-model nomic --num-qa-pairs 100

# Use specific reranker
run_mirage -i data/documents -o output/results \
  --reranker-model monovlm --num-qa-pairs 100

# Custom multi-hop depth
run_mirage -i data/documents -o output/results \
  --max-depth 3 --num-qa-pairs 100

Troubleshooting

Command Not Found

If run_mirage command is not found after pip installation:

# Check if package is installed
pip show mirage-benchmark

# Reinstall if needed
pip install --upgrade mirage-benchmark

# Verify installation
run_mirage --version

API Key Issues

# Check if API key is set
echo $GEMINI_API_KEY  # or $OPENAI_API_KEY

# Set it if missing
export GEMINI_API_KEY="your-key"

Preflight Check Failures

# Run verbose preflight
run_mirage --preflight --verbose

Import Errors (Development)

If you're developing from source and encounter import errors:

# Reinstall in editable mode
pip install -e .

# Or run directly with PYTHONPATH
PYTHONPATH=src python src/mirage/run_mirage.py --help

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

See CONTRIBUTING.md for details.

Citation

@misc{sahu2026miragemultiagentframeworkgenerating,
      title={MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation}, 
      author={Chandan Kumar Sahu and Premith Kumar Chilukuri and Matthew Hetrich},
      year={2026},
      eprint={2601.15487},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.15487}, 
}

License

Apache License 2.0 - see LICENSE

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

chpk

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.0.0

Feb 9, 2026

1.4.0

Feb 9, 2026

1.3.1

Feb 9, 2026

1.3.0

Feb 9, 2026

This version

1.2.7

Feb 8, 2026

1.2.6

Jan 29, 2026

1.2.5

Jan 28, 2026

1.2.4

Jan 28, 2026

1.2.3

Jan 28, 2026

1.2.2

Jan 28, 2026

1.2.1

Jan 28, 2026

1.2.0

Jan 21, 2026

1.0.6

Jan 14, 2026

1.0.5

Jan 6, 2026

1.0.4

Jan 6, 2026

1.0.3

Jan 6, 2026

1.0.2

Jan 6, 2026

1.0.0

Jan 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mirage_benchmark-1.2.7.tar.gz (1.7 MB view details)

Uploaded Feb 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mirage_benchmark-1.2.7-py3-none-any.whl (226.7 kB view details)

Uploaded Feb 8, 2026 Python 3

File details

Details for the file mirage_benchmark-1.2.7.tar.gz.

File metadata

Download URL: mirage_benchmark-1.2.7.tar.gz
Upload date: Feb 8, 2026
Size: 1.7 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mirage_benchmark-1.2.7.tar.gz
Algorithm	Hash digest
SHA256	`b4798da252fe0f30dff1fd5333913cc614ca662ee31b65538b08aef506449746`
MD5	`6cba22895f76e057f4b7c5896e052f65`
BLAKE2b-256	`1138962e4797abfc8a893554d5e9c78f6355df57d7d5dc8256f2e1ad9ac2cada`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mirage_benchmark-1.2.7.tar.gz:

Publisher: publish-pypi.yml on ChandanKSahu/MiRAGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mirage_benchmark-1.2.7.tar.gz
- Subject digest: b4798da252fe0f30dff1fd5333913cc614ca662ee31b65538b08aef506449746
- Sigstore transparency entry: 928460904
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: ChandanKSahu/MiRAGE@cb390c1daac675a8a35f64955e9ab1936127c038
- Branch / Tag: refs/tags/v1.2.7
- Owner: https://github.com/ChandanKSahu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@cb390c1daac675a8a35f64955e9ab1936127c038
- Trigger Event: release

File details

Details for the file mirage_benchmark-1.2.7-py3-none-any.whl.

File metadata

Download URL: mirage_benchmark-1.2.7-py3-none-any.whl
Upload date: Feb 8, 2026
Size: 226.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mirage_benchmark-1.2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b876596a6db153a6bfe90567a549db58704f275af71eff8faa0d49ca2bb9a32c`
MD5	`451cf4d61d30d9a9443508964751a02b`
BLAKE2b-256	`5d7af33d98df810e2c9142a2dd53a282b016f8dad6104cc8a928642a0ebcab35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mirage_benchmark-1.2.7-py3-none-any.whl:

Publisher: publish-pypi.yml on ChandanKSahu/MiRAGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mirage_benchmark-1.2.7-py3-none-any.whl
- Subject digest: b876596a6db153a6bfe90567a549db58704f275af71eff8faa0d49ca2bb9a32c
- Sigstore transparency entry: 928460907
- Sigstore integration time: Feb 8, 2026
Source repository:
- Permalink: ChandanKSahu/MiRAGE@cb390c1daac675a8a35f64955e9ab1936127c038
- Branch / Tag: refs/tags/v1.2.7
- Owner: https://github.com/ChandanKSahu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@cb390c1daac675a8a35f64955e9ab1936127c038
- Trigger Event: release

mirage-benchmark 1.2.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation

Multiagent Architecture

Sample QA Pair

Interactive Process Flow

Key Features

Table of Contents

Installation

From PyPI

From Source

With Optional Dependencies

GPU Support (FAISS-GPU)

Quick Start

Step 1: Install the Package

Step 2: Set Up API Key

Step 3: Prepare Your Data

Step 4: Run MiRAGE

Step 5: Check Results

Quick Test

Usage

Basic Usage (QA Generation Only)

With Deduplication

With Evaluation Metrics

Full Pipeline (Deduplication + Evaluation)

With All Options

Auto-Selected Reranker

Run Preflight Checks

Using Sample Dataset

API Keys Setup

Google Gemini

OpenAI

Ollama (Local - Free)

Configuration

Using config.yaml

Cost Optimization

1. Document Processing (PDF/HTML -> Markdown -> Chunks)

2. Multi-hop Context Building

Command Line Options

Multihop QA Visualization

Output Format

Generated Files

QA Dataset Structure

Multihop QA Visualization

Project Structure

Python API

Examples

Generate QA from PDFs

Generate More QA Pairs

Use More Workers

Skip Already Processed Steps

Custom Models

Troubleshooting

Command Not Found

API Key Issues

Preflight Check Failures

Import Errors (Development)

Contributing

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution