Skip to main content

A CLI tool that automates RAG hyperparameter optimization using Bayesian search and synthetic data generation

Project description

AutoRAG-Optim

PyPI version Python 3.10+ License: MIT

Stop guessing your RAG configuration. Let AutoRAG find the optimal one for your data.

AutoRAG-Optim is a CLI tool that automatically discovers the best RAG (Retrieval-Augmented Generation) hyperparameters for your specific database. Instead of manually testing hundreds of parameter combinations, run one command and get a production-ready configuration optimized for your data.

Why AutoRAG?

Most teams waste weeks manually tuning RAG settingsโ€”chunk sizes, embedding models, retrieval countsโ€”without knowing what actually works best for their data. AutoRAG solves this by:

  • Generating synthetic test data from your documents (no manual labeling needed)
  • Intelligently searching the configuration space (20-30 experiments instead of 1000+)
  • Evaluating with real metrics (accuracy, faithfulness, relevancy, recall)
  • Running entirely locally (ChromaDB for vectorsโ€”no Pinecone API key required)

Typical results: 30-40% cost reduction and 20-35% accuracy improvement over default settings.

Features

Feature Description
๐Ÿ”Smart Optimization Bayesian or Grid Search to find optimal parameters in 20-30 experiments
โšกTwo-Phase Architecture Expensive indexing params tested separately from fast query params
๐Ÿ“Š5 Tunable Parameters chunk_size, chunk_overlap, embedding_model, top_k, temperature
๐Ÿค–Synthetic Q&A Generation Auto-generate test questions from your documents using LLM
๐Ÿ“ˆRAGAS-like Evaluation Measure accuracy, faithfulness, relevancy, and context recall
๐Ÿ—„๏ธLocal Vector Store ChromaDB runs locallyโ€”no external API keys needed
๐Ÿ”ŒMulti-Database Support Supabase Storage, MongoDB, PostgreSQL
๐Ÿง Multi-LLM Support Groq, OpenAI, OpenRouter
๐Ÿ“‹Rich CLI Output Beautiful terminal output with progress bars, tables, and HTML reports

Installation

pip install autorag-optim

For RAGAS evaluation (optional):

pip install autorag-optim[ragas]

Quick Start

1. Create Configuration

Create a config.yaml file:

database:
  type: supabase
  url: https://your-project.supabase.co
  key: your-supabase-anon-key
  bucket: pdf
  folder: pdf

llm:
  provider: groq
  model: null  # Uses default: llama-3.3-70b-versatile

api_keys:
  groq: your-groq-api-key

rag:
  chunk_size: [256, 512, 1024]
  chunk_overlap: [50, 100]
  embedding_model:
    - all-MiniLM-L6-v2
  top_k: [3, 5, 10]
  temperature: [0.3, 0.7]

optimization:
  strategy: bayesian    # or: grid
  num_experiments: 20
  test_questions: 50

evaluation:
  method: custom        # or: ragas

2. Run Optimization

autorag optimize --config config.yaml

3. View Results

autorag results --show-report

Configuration Options

Optimization Strategy

Strategy Description Best For
bayesian Intelligent search using Optuna TPE sampler Default choiceโ€”finds good configs with fewer experiments
grid Systematic search with stratified sampling Guaranteed coverage of search space

Evaluation Method

Method Description Notes
custom Built-in token-optimized evaluator Works with any LLM, fast, no extra dependencies
ragas Official RAGAS library metrics Requires pip install ragas, uses OpenAI-compatible API

LLM Providers

Provider Default Model Notes
groq llama-3.3-70b-versatile Fast inference, generous free tier
openai gpt-4o-mini High quality, production-ready
openrouter meta-llama/llama-3.3-70b-instruct Access to 100+ models

Database Connectors

Type Description Config Fields
supabase Supabase Storage bucket url, key, bucket, folder
mongodb MongoDB collection connection_string, database, collection
postgresql PostgreSQL table host, port, database, table, user, password

Estimated API Calls & Runtime

Understanding the cost before running optimization:

Formula

LLM Calls โ‰ˆ Q&A Generation + (Experiments ร— Questions ร— Calls per Question)

Where:
- Q&A Generation = ceil(test_questions / 2)  [~1 call per 2 questions]
- Calls per Question = 1 (RAG query) + 3 (evaluation) = 4 calls

Estimates by Configuration

Questions Experiments LLM Calls Est. Time*
20 10 ~810 15-30ย min
50 20 ~4,025 45-60 min
50 30 ~6,025 60-90 min
100 20 ~8,050 100-150ย min

*Time varies based on LLM provider rate limits and response times. Groq is typically fastest.

Cost Saving Tips

  • Start with fewer experiments (10-15) to validate your setup
  • Use bayesian strategyโ€”it finds good configs with 30-40% fewer experiments than grid search
  • Reduce test_questions for initial exploration (20-30 is enough to rank configs)

How It Works

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  1. CONNECT                                                     โ”‚
โ”‚     Fetch documents from your database (Supabase/Mongo/PG)      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  2. GENERATE                                                    โ”‚
โ”‚     Create synthetic Q&A pairs from your documents using LLM    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  3. OPTIMIZE (Two-Phase)                                        โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚     โ”‚ OUTER LOOP: Indexing params (expensive)             โ”‚     โ”‚
โ”‚     โ”‚   โ†’ chunk_size, chunk_overlap, embedding_model      โ”‚     โ”‚
โ”‚     โ”‚   โ†’ Requires re-indexing documents                  โ”‚     โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚     โ”‚ INNER LOOP: Query params (fast)                     โ”‚     โ”‚
โ”‚     โ”‚   โ†’ top_k, temperature                              โ”‚     โ”‚
โ”‚     โ”‚   โ†’ Same index, just different retrieval settings   โ”‚     โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  4. EVALUATE                                                    โ”‚
โ”‚     Score each config: relevancy, faithfulness, similarity,     โ”‚
โ”‚     context recall โ†’ weighted aggregate score                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  5. REPORT                                                      โ”‚
โ”‚     Terminal table + JSON + HTML report with best config        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

CLI Commands

Command Description
autorag optimize Run RAG optimization on your database
autorag results Display optimization results
autorag status Check optimization progress (async mode)
autorag optimize --help

Options:
  -c, --config PATH   Path to config file (default: config.yaml)
  --async             Run optimization in background

Evaluation Metrics

Metric What It Measures
Answer Relevancy Is the answer relevant to the question asked?
Faithfulness Is the answer grounded in the retrieved context?
Answer Similarity How similar is the generated answer to ground truth?
Context Recall Does the retrieved context contain the required information?

Development

# Clone repository
git clone https://github.com/vatsalpjain/autorag-optim.git
cd autorag-optim

# Install with dev dependencies
uv sync --extra dev

# Run CLI
uv run autorag --help

# Run tests
uv run pytest tests/ -v

Requirements

  • Python 3.10+
  • LLM API key (Groq, OpenAI, or OpenRouter)
  • Database (Supabase, MongoDB, or PostgreSQL)
  • No Pinecone requiredโ€”uses local ChromaDB

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autorag_optim-0.1.1.tar.gz (450.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autorag_optim-0.1.1-py3-none-any.whl (60.4 kB view details)

Uploaded Python 3

File details

Details for the file autorag_optim-0.1.1.tar.gz.

File metadata

  • Download URL: autorag_optim-0.1.1.tar.gz
  • Upload date:
  • Size: 450.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for autorag_optim-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b188ae048391ffdf95c724af3c9a79f7cef91047f4228e12372e20875304a15a
MD5 c544244fe19b93124f746a6f4ef7c9b5
BLAKE2b-256 4d4dfc80a835ab7d337af7d5b6caa445adafe086715eacc22b13a99815e3f29a

See more details on using hashes here.

File details

Details for the file autorag_optim-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: autorag_optim-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 60.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for autorag_optim-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c36fb61d58d434850bdf315dcaefd14cfced31f52c26f3d9103c610c799fdfa0
MD5 f6a31571ed5783151111de53080c631f
BLAKE2b-256 3cbe00e0a13282bcb5c928cfeaa0b1f061410b4c010f0778c7ef919e4e16d233

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page