Foundation of AI - Reinforcement learning Library

These details have not been verified by PyPI

Project links

Project description

FAI-RL: Foundation AI - Reinforcement Learning Library

A production-ready framework for training, inference, evaluation using advanced reinforcement learning techniques. Built for researchers and practitioners who need a flexible, scalable solution for LLM fine-tuning.

Overview

FAI-RL provides a unified, extensible framework for fine-tuning language models with the state-of-the-art algorithms:

🎯 Supports Multiple RL Algorithms: DPO, GRPO, GSPO implementations as well as support for Supervised Fine-Tuning (text and multimodal vision-language) and Continuous Pre-Training.
🚀 Production Ready: Validated on AWS p4d instances with 8x A100 GPUs
📦 Simple Configuration: YAML-based configs with CLI override support
⚡ Memory Efficient: Full support for LoRA, QLoRA, and DeepSpeed ZeRO-3
🔧 Highly Extensible: Custom reward functions, dataset templates, and API integrations

Installation
Authentication & Setup
Quick Start
Supported Methods
Supported Models
Key Features
Project Structure
S3 Checkpoint Upload
Memory Optimization
System Requirements
License

📦 Installation

FAI-RL does not pin torch in its package metadata, so it stays cross-platform and never locks you to a single CUDA build. You install a torch/torchvision build matching your host first, then install FAI-RL on top. We recommend a dedicated conda environment.

1. Create a conda environment

conda create -n fai-rl python=3.12 -y
conda activate fai-rl

Python 3.9–3.12 are supported.

2. Clone the repository

The training/inference/evaluation commands read recipe YAMLs that live in the repo, so cloning is recommended even if you install from PyPI.

git clone https://github.com/Roblox/FAI-RL.git
cd FAI-RL

3. Install PyTorch (CUDA 13.0)

requirements-cu130.txt pulls torch/torchvision from the CUDA 13.0 wheel index. For a different CUDA version, edit the index URL in that file (e.g. cu121, cu124).

pip install -r requirements-cu130.txt

4. Install FAI-RL

# From source (editable) — recommended, includes the recipe YAMLs
pip install -e ".[cuda]"        # bitsandbytes, deepspeed, mpi4py

# ...or from PyPI
pip install "FAI-RL[cuda]"

The [cuda] extra is Linux/NVIDIA only. On macOS, omit it: pip install -e .

Package: https://pypi.org/project/FAI-RL/

🔑 Authentication & Setup

Before training or using models, you'll need to authenticate with HuggingFace and optionally set up experiment tracking with Weights & Biases.

HuggingFace Authentication

huggingface-cli login

You'll be prompted to enter your HuggingFace access token. You can create a token at https://huggingface.co/settings/tokens.

What this enables:

Access gated models (if you have permission)

Weights & Biases (Optional)

wandb login

You'll be prompted to enter your W&B API key. Get your API key at https://wandb.ai/authorize.

For self-hosted or private W&B deployments, set WANDB_BASE_URL before training:

export WANDB_BASE_URL="https://your-wandb-instance.com"

The default value (https://api.wandb.ai) points to the public W&B cloud. This can also be set directly in the recipe under wandb.base_url.

Note: W&B integration is optional. If not logged in, training will proceed without experiment tracking.

🚀 Quick Start

Training

Train a model using any of the supported algorithms (CPT, SFT, SFT_VLM, DPO, GRPO, GSPO):

# Single GPU training with LoRA
fai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 1

# Multi-GPU training with DeepSpeed
fai-rl-train --recipe recipes/training/dpo/llama3_3B_lora.yaml --num-gpus 8

# Override parameters from CLI
fai-rl-train --recipe recipes/training/sft/llama3_3B_lora.yaml --num-gpus 4 \
  training.learning_rate=5e-5 \
  training.num_train_epochs=3

📖 Complete Training Guide →

Inference

Generate text completions from trained or base models:

# Run inference on a trained model
fai-rl-inference --recipe recipes/inference/llama3_3B.yaml

# Use debug mode for detailed logging
fai-rl-inference --recipe recipes/inference/llama3_3B.yaml --debug

📖 Complete Inference Guide →

Evaluation

Evaluate model performance on academic benchmarks (MMLU, GSM8K):

# Evaluate on MMLU benchmark
fai-rl-eval --recipe recipes/evaluation/mmlu/llama3_3B.yaml --debug

📖 Complete Evaluation Guide →

Supported Algorithms

FAI-RL supports six training algorithms for language model fine-tuning:

Algorithm	Full Name	Description	Best For
CPT	Continuous Pre-Training	Next-token prediction on raw text; no chat template	Domain adaptation, corpus ingestion
SFT	Supervised Fine-Tuning	Direct supervised learning from labeled examples	Instruction fine-tuning and foundational model fine-tuning
SFT_VLM	Multimodal Supervised Fine-Tuning	Supervised fine-tuning of vision-language models on `(image, text) -> response` data (supports multiple images per row)	Instruction-tuning VLMs, image understanding tasks
DPO	Direct Preference Optimization	Alignment via preference learning without explicit reward models	Human preference alignment, chat model training
GRPO	Group Relative Policy Optimization	Efficient preference learning with group-based comparison	Reasoning tasks, competitive response generation
GSPO	Group Sequence Policy Optimization	Advanced sequence-level policy optimization	Complex multi-step reasoning, mathematical problem-solving

Training Configurations

All algorithms support three efficiency modes:

Mode	Memory Usage	Training Speed	Best For
Full Fine-tuning	High (baseline)	Fastest	Small models (<3B params), maximum performance
LoRA	Low (~10% of full)	Fast	Most use cases, balanced efficiency
QLoRA	Very Low (~3-4GB for 7B model)	Moderate	Large models on consumer GPUs

Additional features supported across all algorithms:

✅ Multi-GPU training with DeepSpeed ZeRO-3
✅ Gradient checkpointing for memory efficiency
✅ Custom reward functions and dataset templates
✅ Weights & Biases integration for experiment tracking
✅ Automatic S3 checkpoint upload (supports S3-compatible stores)

Supported Models

FAI-RL provides pre-configured recipes and has been validated with the following models:

#	Model
1	Qwen3 30B A3B Instruct (2507)
2	Qwen3.6 27B
3	Gemma 4 31B
4	Gemma 4 26B A4B IT
5	Qwen3-VL 30B A3B
6	Qwen3 8B
7	Qwen3 4B
8	Llama 3.2 3B
9	Llama 3.1 8B

Key Features

🎯 Flexible Configuration System

YAML-based recipes with comprehensive inline documentation for all parameters
CLI overrides for runtime parameter changes without editing files
Pre-configured templates for popular models (Llama 3, Qwen 3, etc.)
Easy experimentation with hyperparameter tuning

🔧 Extensible Architecture

Custom Reward Functions:

exact_match_reward_func - Accuracy-based rewards for verifiable tasks
structured_xml_reward_func - Format-based rewards for structured outputs
Easy to add your custom reward function

Dataset Templates:

GSM8KTemplate - Math problem formatting with chain-of-thought
OpenMathInstructTemplate - Mathematical instruction formatting

Pluggable Components:

Extensible trainer base classes for new algorithms
HuggingFace Transformers and TRL integration
Custom dataset processing pipelines

🌐 Multi-Provider API Support

Native support for commercial LLM APIs with automatic provider detection for inference and evaluation:

Supported Providers:

🤖 OpenAI (GPT-5, GPT-4.5, GPT-4.1, etc.)
🧠 Google (Gemini Pro, Gemini Flash)
💬 Anthropic (Claude 4.5 Sonnet, Opus, etc.)
🏠 Hosted LLM (self-hosted or custom endpoints)

Configuration Example:

# OpenAI ChatGPT - provider detected from endpoint URL
inference:
  api_endpoint: "https://api.openai.com/v1/chat/completions"
  api_key: "sk-..."
  model: "gpt-4.1"  # Just the model name, no prefix needed!

# Google Gemini - provider detected from endpoint URL
inference:
  api_endpoint: "https://generativelanguage.googleapis.com/v1/models/gemini-pro:generateContent"
  api_key: "AIza..."
  model: "gemini-2.5-pro"

# Anthropic Claude - provider detected from endpoint URL
inference:
  api_endpoint: "https://api.anthropic.com/v1/messages"
  api_key: "sk-ant-..."
  model: "claude-sonnet-4-5-20250929"

# Hosted LLM - any custom or self-hosted model endpoint
inference:
  api_endpoint: "https://your-hosted-endpoint.com/v1/chat"
  api_key: "your-api-key"
  model: "your-model-name"

Customization for Custom APIs:

If your hosted LLM uses a non-OpenAI format, customize utils/hosted_llm_config.py:

build_hosted_llm_request() - Modify request payload format
parse_hosted_llm_response() - Customize response parsing
build_hosted_llm_headers() - Adjust authentication headers

Each function includes detailed examples and inline documentation.

📁 Project Structure

FAI-RL/
├── core/                      # Core framework components
├── trainers/                  # Algorithm implementations
│   ├── rewards/               # Custom reward functions
│   │   ├── accuracy_rewards.py
│   │   └── format_rewards.py
│   └── templates/             # Dataset formatting templates
│       ├── gsm8k_template.py
│       └── openmathinstruct_template.py
├── inference/                 # Inference system
├── evaluations/               # Evaluation system
│   └── eval_datasets/         # Dataset-specific evaluation logic
│       ├── mmlu.py
│       └── gsm8k.py
├── recipes/                   # YAML configuration files
│   ├── training/              # Training recipes (cpt/, sft/, sft_vlm/, dpo/, grpo/, gspo/)
│   ├── inference/             # Inference recipes
│   └── evaluation/            # Evaluation recipes (mmlu/, gsm8k/)
├── configs/                   # DeepSpeed configurations
│   └── deepspeed/             # ZeRO-3 configs for 1/2/4/8 GPUs
├── utils/                     # Shared utilities
│   ├── s3_utils.py            # S3 checkpoint upload callback
│   └── hosted_llm_config.py   # Custom API endpoint configuration
└── [auto-generated]
    ├── models/                # Trained model checkpoints
    ├── outputs/               # Inference and evaluation results
    └── logs/                  # Training logs

☁️ S3 Checkpoint Upload

FAI-RL can automatically upload checkpoints and the final fine-tuned model to Amazon S3 (or any S3-compatible store such as MinIO). Uploads run in background threads so they never block training.

Prerequisites

Configure AWS credentials using any standard method (environment variables, ~/.aws/credentials, IAM role, etc.):

# Option 1: Environment variables
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-east-1"

# Option 2: AWS CLI
aws configure

Configuration

Add an s3 section to your training recipe YAML:

s3:
  enabled: true                                          # Enable S3 upload
  bucket: "your-s3-bucket"                               # S3 bucket name
  prefix: "your-s3-prefix"                               # Key prefix (folder path inside bucket)
  region: null                                           # AWS region (null = use default)
  endpoint_url: null                                     # Custom S3-compatible endpoint (e.g. MinIO)
  upload_checkpoints: true                               # Upload intermediate checkpoints (at every save_steps)
  upload_final_model: true                               # Upload the final model at end of training
  delete_local_after_upload: false                       # Delete local files after successful upload

Parameter	Type	Default	Description
`enabled`	bool	`false`	Master switch for the S3 upload feature
`bucket`	string	`""`	Target S3 bucket name (required when enabled)
`prefix`	string	`""`	Key prefix under which all uploads are stored
`region`	string	`null`	AWS region; falls back to `AWS_DEFAULT_REGION` or boto3 default
`endpoint_url`	string	`null`	Custom endpoint for S3-compatible stores (e.g. `http://minio:9000`)
`upload_checkpoints`	bool	`true`	Upload each intermediate checkpoint saved at `save_steps` intervals
`upload_final_model`	bool	`true`	Upload the final model directory at the end of training
`delete_local_after_upload`	bool	`false`	Remove local checkpoint directory after a successful upload

How It Works

Intermediate checkpoints -- When the trainer saves a checkpoint (every training.save_steps steps), the S3 callback uploads the entire checkpoint directory to s3://<bucket>/<prefix>/checkpoint-<step>/ in a background thread.
Final model -- At the end of training, the output directory is uploaded to s3://<bucket>/<prefix>/final/.
Non-blocking -- All uploads happen on daemon threads. Training continues while files are being transferred. At the end of training, the callback waits for any remaining uploads to finish before the process exits.

S3 Upload Structure

Given the example config above, the resulting S3 layout would be:

s3://your-s3-bucket/
└── checkpoints/qwen3-4B-inst-dpo-lora-150k/
    ├── checkpoint-100/
    │   ├── adapter_config.json
    │   ├── adapter_model.safetensors
    │   └── ...
    ├── checkpoint-200/
    │   └── ...
    └── final/
        ├── adapter_config.json
        ├── adapter_model.safetensors
        └── ...

Memory Optimization

FAI-RL provides multiple techniques for efficient training of large models on limited hardware:

Optimization Techniques

Technique	Memory Savings	Speed Impact	Configuration
LoRA	~90% reduction	Minimal	`use_lora: true` + LoRA params
QLoRA	~95% reduction	Moderate	`load_in_4bit: true` + LoRA params
8-bit Quantization	~50% reduction	Minimal	`load_in_8bit: true`
Gradient Checkpointing	~30-50% reduction	20% slower	`gradient_checkpointing: true`
DeepSpeed ZeRO-3	Distributed across GPUs	Varies	Auto-enabled for multi-GPU

Optimization Strategy

Start with QLoRA if GPU memory is limited (<16GB)
Use LoRA for balanced efficiency on mid-range GPUs (16-40GB)
Full fine-tuning only for small models or high-end GPUs (80GB+)
Enable gradient checkpointing if still encountering OOM errors
Use DeepSpeed ZeRO-3 for multi-GPU setups to distribute memory load

🧪 System Requirements

Validated on Hardware

This framework has been validated on:

Instance: AWS EC2 p4d.24xlarge
GPUs: 8 x NVIDIA A100-SXM4-80GB (80GB VRAM each)
CPU: 96 vCPUs
Memory: 1152 GiB
Storage: 8TB NVMe SSD
Network: 400 Gbps

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

For Maintainers

Publishing a New Release

Update version in pyproject.toml:

[project]
name = "FAI-RL"
version = "X.Y.Z"  # Increment version

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.43

Jul 23, 2026

0.1.42

Jul 15, 2026

0.1.41

Jul 14, 2026

0.1.40

Jul 6, 2026

0.1.39

Jun 30, 2026

0.1.38

Jun 30, 2026

0.1.37

Jun 26, 2026

0.1.36

Jun 25, 2026

0.1.35

Jun 25, 2026

0.1.34

Jun 25, 2026

0.1.33

Jun 24, 2026

0.1.32

Jun 24, 2026

0.1.31

Jun 16, 2026

0.1.30

May 26, 2026

0.1.29

May 19, 2026

0.1.28

May 18, 2026

0.1.27

May 15, 2026

0.1.26

May 14, 2026

0.1.25

May 14, 2026

0.1.24

May 14, 2026

0.1.23

May 13, 2026

0.1.22

May 13, 2026

0.1.21

May 12, 2026

0.1.20

May 9, 2026

0.1.19

May 1, 2026

0.1.18

Apr 30, 2026

0.1.17

Apr 29, 2026

0.1.16

Apr 21, 2026

0.1.15

Apr 13, 2026

0.1.14

Mar 31, 2026

0.1.13

Dec 4, 2025

0.1.12

Dec 4, 2025

0.1.11

Nov 13, 2025

0.1.10

Nov 12, 2025

0.1.9

Oct 31, 2025

0.1.8

Oct 29, 2025

0.1.7

Oct 23, 2025

0.1.6

Oct 20, 2025

0.1.5

Oct 13, 2025

0.1.4

Oct 13, 2025

0.1.3

Oct 13, 2025

0.1.2

Oct 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fai_rl-0.1.43.tar.gz (139.1 kB view details)

Uploaded Jul 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fai_rl-0.1.43-py3-none-any.whl (246.2 kB view details)

Uploaded Jul 23, 2026 Python 3

File details

Details for the file fai_rl-0.1.43.tar.gz.

File metadata

Download URL: fai_rl-0.1.43.tar.gz
Upload date: Jul 23, 2026
Size: 139.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for fai_rl-0.1.43.tar.gz
Algorithm	Hash digest
SHA256	`f2c7800d5265ee23ca1cc33560a492f80c82c37a8fe60a66dffbe0916872b113`
MD5	`680a59ba56eac5e767ea677f1054ebc7`
BLAKE2b-256	`f4321a72c42e1ce2778f692412c861c7deb6cf2aed62f2c446664e1fbeced8e1`

See more details on using hashes here.

File details

Details for the file fai_rl-0.1.43-py3-none-any.whl.

File metadata

Download URL: fai_rl-0.1.43-py3-none-any.whl
Upload date: Jul 23, 2026
Size: 246.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for fai_rl-0.1.43-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dcde51020445b5746fc4fd1b6d3a063ada2ad13e4840e6984134f5981713df20`
MD5	`b2787acd55a4e4f7fb306b30ddea8167`
BLAKE2b-256	`fbeeefe8429f5ad01a019441296d44939a36cee63740cb921c4914d80668c4bd`

See more details on using hashes here.

FAI-RL 0.1.43

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FAI-RL: Foundation AI - Reinforcement Learning Library

Overview

Table of Contents

📦 Installation

1. Create a conda environment

2. Clone the repository

3. Install PyTorch (CUDA 13.0)

4. Install FAI-RL

🔑 Authentication & Setup

HuggingFace Authentication

Weights & Biases (Optional)

🚀 Quick Start

Training

Inference

Evaluation

Supported Algorithms

Training Configurations

Supported Models

Key Features

🎯 Flexible Configuration System

🔧 Extensible Architecture

🌐 Multi-Provider API Support

📁 Project Structure

☁️ S3 Checkpoint Upload

Prerequisites

Configuration

How It Works

S3 Upload Structure

Memory Optimization

Optimization Techniques

Optimization Strategy

🧪 System Requirements

Validated on Hardware

📄 License

For Maintainers

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes