Skip to main content

RapidFire AI: Rapid experimentation for easier, faster, and more impactful AI customization. Built for agentic RAG, context engineering, fine-tuning, and post-training of LLMs and other DL models.

Project description

PyPI version

RapidFire AI

Rapid experimentation for easier, faster, and more impactful AI customization. Built for agentic RAG, context engineering, fine-tuning, and post-training of LLMs and other DL models. Delivers 16-24x higher throughput without extra resources.

Overview

RapidFire AI is a new experiment execution framework that transforms your AI customization experimentation from slow, sequential processes into rapid, intelligent workflows with hyperparallelized execution, dynamic real-time experiment control, and automatic system optimization.

Usage workflow of RapidFire AI

RapidFire AI's adaptive execution engine allows interruptible, shard-based scheduling so you can compare many configurations concurrently, even on a single GPU (for self-hosted models) or a CPU-only machine (for closed model APIs) with dynamic real-time control over runs.

  • Hyperparallelized Execution: Higher throughput, simultaneous, data shard-at-a-time execution to show side-by-side differences.
  • Interactive Control (IC Ops): Stop, Resume, Clone-Modify, and optionally warm start runs in real-time from the dashboard.
  • Automatic Optimization: Intelligent single and multi-GPU orchestration to optimize utilization with minimal overhead for self-hosted models; intelligent token spend and rate limit apportioning for closed model APIs.

Shard-based concurrent execution (1 GPU)

For additional context, see the overview: RapidFire AI Overview

Getting Started

Prerequisites

Install and Get Started

# Ensure that python3 resolves to python3.12 if needed
python3 --version  # must be 3.12.x

python3 -m venv .venv
source .venv/bin/activate

pip install rapidfireai

rapidfireai --version
# Verify it prints the following:
# RapidFire AI 0.12.3

# Replace YOUR_TOKEN with your actual Hugging Face token
# https://huggingface.co/docs/hub/en/security-tokens
hf auth login --token YOUR_TOKEN

# Due to current issue: https://github.com/huggingface/xet-core/issues/527
pip uninstall -y hf-xet

# For Fine-tuning/Post-Training: Install specific dependencies and initialize rapidfireai
rapidfireai init
[OR]
# For RAG/Context Engineering Evals: Install specific dependencies and initialize rapidfireai for SFT/RFT
rapidfireai init --evals

# Start the rapidfireai server
# For Google Colab run:
#   export RF_TRACKING_BACKEND=tensorboard
#   rapidfireai start --colab
# For standalone run:

# For Fine-tuning/Post-Training only: Start dashboard metrics server - ONLY
rapidfireai start
# It should print about 50 lines, including the following:
# ...
# RapidFire Frontend is ready
# Open your browser and navigate to: http://0.0.0.0:3000
# ...
# Press Ctrl+C to stop all services

# Open an example notebook from ./tutorial_notebooks and start experiment

Troubleshooting

For a quick system diagnostics report (Python env, relevant packages, GPU/CUDA, and key environment variables), run:

rapidfireai doctor

If you encounter port conflicts, you can kill existing processes:

lsof -t -i:5002 | xargs kill -9  # mlflow
lsof -t -i:8081 | xargs kill -9  # dispatcher
lsof -t -i:3000 | xargs kill -9  # frontend server

Documentation

Browse or reference the full documentation, example use case tutorials, all API details, dashboard details, and more in the RapidFire AI Documentation.

Key Features

MLflow Integration

Full MLflow support for experiment tracking and metrics visualization. A named RapidFire AI experiment corresponds to an MLflow experiment for comprehensive governance

Interactive Control Operations (IC Ops)

First-of-its-kind dynamic real-time control over runs in flight. Can be invoked through the dashboard:

  • Stop active runs; puts them in a dormant state
  • Resume stopped runs; makes them active again
  • Clone and modify existing runs, with or without warm starting from parent's weights
  • Delete unwanted or failed runs

Multi-GPU Support

The Scheduler automatically handles multiple GPUs on the machine and divides resources across all running configs for optimal resource utilization.

Search and AutoML Support

Built-in procedures for searching over configuration knob combinations, including Grid Search and Random Search. Easy to integrate with AutoML procedures. Native support for some popular AutoML procedures and customized automation of IC Ops coming soon.

Directory Structure

rapidfireai/
├── fit
    ├── automl/          # Search and AutoML algorithms for knob tuning
    ├── backend/         # Core backend components (controller, scheduler, worker)
    ├── db/              # Database interface and SQLite operations
    ├── dispatcher/      # Flask-based web API for UI communication
    ├── frontend/        # Frontend components (dashboard, IC Ops implementation)
    ├── ml/              # ML training utilities and trainer classes
    └── utils/           # Utility functions and helper modules
├── evals
    ├── actors/          # Ray-based workers for doc and query processing  
    ├── automl/          # Search and AutoML algorithms for knob tuning
    ├── data/            # Data sharding and handling
    ├── db/              # Database interface and SQLite operations
    ├── dispatcher/      # Flask-based web API for UI communication
    ├── metrics/         # Online aggregation logic and metrics handling
    ├── rag/             # Stages of RAG pipeline
    ├── scheduling/      # Fair scheduler for multi-config resource sharing
    └── utils/           # Utility functions and helper modules
└── experiment.py        # Main experiment lifecycle management

Architecture

RapidFire AI adopts a microservices-inspired loosely coupled distributed architecture with:

  • Dispatcher: Web API layer for UI communication
  • Database: SQLite for state persistence
  • Controller: Central orchestrator running in user process
  • Workers: GPU-based training processes (for SFT/RFT) or Ray-based Actors for doc and query processing (for RAG/context engineering)
  • Dashboard: Experiment tracking and visualization dashboard

This design enables efficient resource utilization while providing a seamless user experience for AI experimentation.

Components

Dispatcher

The dispatcher provides a REST API interface for the web UI. It can be run via Flask as a single app or via Gunicorn to have it load balanced. Handles interactive control features and displays the current state of the runs in the experiment.

Database

Uses SQLite for persistent storage of metadata of experiments, runs, and artifacts. The Controller also uses it to talk with Workers on scheduling state. A clean asynchronous interface for all DB operations, including experiment lifecycle management and run tracking.

Controller

Runs as part of the user’s console or Notebook process. Orchestrates the entire training lifecycle including model creation, worker management, and scheduling, as well as the entire RAG/context engineering pipeline for evals. The run_fit logic handles sample preprocessing, model creation for given knob configurations, worker initialization, and continuous monitoring of training progress across distributed workers. The run_evals logic handles data chunking, embedding, retrieval, reranking, context construction, and generation for inference evals.

Worker

Handles the actual model training and inference on the GPUs for run_fit and the data preprocessing and RAG inference evals for run_evals. Workers poll the Database for tasks, load dataset shards, and execute config-specific tasks: training runs with checkpointing (for SFT/RFT) and doc processing followed by query processing with online aggregation (for RAG/context eng. evals). Both also handle progress reporting. Currently expects any given model for given batch size to fit on a single GPU (for self-hosted models). Likewise, currently expects OpenAI API key provided to have sufficient balance for given evals workload.

Experiment

Manages the complete experiment lifecycle, including creation, naming conventions, and cleanup. Experiments are automatically named with unique suffixes if conflicts exist, and all experiment metadata is tracked in the Database. An experiment's running tasks are automatically cancelled when the process ends abruptly.

Dashboard

A fork of MLflow that enables full tracking and visualization of all experiments and runs for run_fit. It features a new panel for Interactive Control Ops that can be performed on any active runs. For run_evals the metrics are displayed in an auot-updated table on the notebook itself, while IC Ops panel also appears on the notebook itself.

Developing with RapidFire AI

Development prerequisites

  • Python 3.12.x
  • Git
  • Ubuntu/Debian system (for apt package manager)
# Run these commands one after the other on a fresh Ubuntu machine

# install dependencies
sudo apt update -y

# clone the repository
git clone https://github.com/RapidFireAI/rapidfireai.git

# navigate to the repository
cd ./rapidfireai

# install basic dependencies
sudo apt install -y python3.12-venv
python3 -m venv .venv
source .venv/bin/activate
pip3 install ipykernel
pip3 install jupyter
pip3 install "huggingface-hub[cli]"
export PATH="$HOME/.local/bin:$PATH"
hf auth login --token <your_token>

# Due to current issue: https://github.com/huggingface/xet-core/issues/527
pip uninstall -y hf-xet

# checkout the main branch
git checkout main

# install the repository as a python package
pip3 install -r requirements.txt

# install node
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash - && sudo apt-get install -y nodejs

# Install correct version of vllm and flash-attn
# uv pip install vllm=0.10.1.1 --torch-backend=cu126 or cu118
# uv pip install flash-attn==1.0.9 --no-build-isoloation or 2.8.3

# if running into node versioning errors, remove the previous version of node then run the lines above again
sudo apt-get remove --purge nodejs libnode-dev libnode72 npm
sudo apt autoremove --purge

# check installations
node -v # 22.x

# still inside venv, run the start script to begin all 3 servers
chmod +x ./rapidfireai/start_dev.sh
./rapidfireai/start_dev.sh start

# run the notebook from within your IDE
# make sure the notebook is running in the .venv virtual environment
# head to settings in Cursor/VSCode and search for venv and add the path - $HOME/rapidfireai/.venv
# we cannot run a Jupyter notebook directly since there are restrictions on Jupyter being able to create child processes

# VSCode can port-forward localhost:3000 where the rf-frontend server will be running

# for port clash issues -
lsof -t -i:8081 | xargs kill -9 # dispatcher
lsof -t -i:5002 | xargs kill -9 # mlflow
lsof -t -i:3000 | xargs kill -9 # frontend

Community & Governance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rapidfireai-0.12.3.tar.gz (46.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rapidfireai-0.12.3-py3-none-any.whl (46.4 MB view details)

Uploaded Python 3

File details

Details for the file rapidfireai-0.12.3.tar.gz.

File metadata

  • Download URL: rapidfireai-0.12.3.tar.gz
  • Upload date:
  • Size: 46.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rapidfireai-0.12.3.tar.gz
Algorithm Hash digest
SHA256 cd8cb8de02f2807de07aa9789989f96f4357d569193ea98c27837d1215b6885a
MD5 f539ba9c5c18cd4edb23dbc7d31d417c
BLAKE2b-256 084979b0aad1b676c06e8dbdc7ce9c5458c4a4faeec8efd1d76bc7d26f9d7e86

See more details on using hashes here.

File details

Details for the file rapidfireai-0.12.3-py3-none-any.whl.

File metadata

  • Download URL: rapidfireai-0.12.3-py3-none-any.whl
  • Upload date:
  • Size: 46.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rapidfireai-0.12.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fba6231e3ba247af61f7c8efa9d1d50c44a0e7b37941ee450811993272fc85a0
MD5 98026c7773e56444871cd130aae0e1cc
BLAKE2b-256 a4bcc7a6eceb4fcf188b78f7fada56541ec4d0b5caf603104c164293123e24bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page