AlignTune: Multi-backend alignment and fine-tuning library. Features TRL and Unsloth backends with complete RL coverage (DPO, PPO, GRPO, BOLT), 27+ reward functions, production-ready evaluation system, and unified configuration interface for LLM alignment.
Project description
AlignTune is a production-ready fine-tuning library designed to simplify training and fine-tuning of Large Language Models (LLMs) with both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) methods. It provides a high-level, unified API that abstracts away the complexities of backend selection, algorithm configuration, and training loops, letting you focus on delivering results.
Core Features
Multi-Backend Architecture: Choose between TRL (reliable, battle-tested) and Unsloth (faster) backends with intelligent auto-selection.
Complete RLHF Coverage: 12+ RL algorithms including DPO, PPO, GRPO, GSPO, DAPO, Dr. GRPO, GBMPO, Counterfactual GRPO, and PACE.
Production-Ready: No mock code, comprehensive error handling, extensive testing, and robust validation.
Quick Start
Supervised Fine-Tuning (SFT)
from aligntune.core.backend_factory import create_sft_trainer
# Create and train SFT model
trainer = create_sft_trainer(
model_name="microsoft/DialoGPT-small",
dataset_name="tatsu-lab/alpaca",
backend="trl",
num_epochs=3,
max_steps = -1,
batch_size=4,
learning_rate=5e-5
)
# Train the model
trainer.train()
# Evaluate
metrics = trainer.evaluate()
print(metrics)
Reinforcement Learning (DPO)
from aligntune.core.backend_factory import create_rl_trainer
# Create and train DPO model
trainer = create_rl_trainer(
model_name="Qwen/Qwen3-0.6B",
dataset_name="Anthropic/hh-rlhf",
algorithm="dpo",
backend="trl",
num_epochs=1,
batch_size=4,
learning_rate=5e-5
)
# Train the model
trainer.train()
Supported Algorithms
| Algorithm | TRL Backend | Unsloth Backend | Description |
|---|---|---|---|
| SFT | Yes | Yes | Supervised Fine-Tuning |
| DPO | Yes | Yes | Direct Preference Optimization |
| PPO | Yes | Yes | Proximal Policy Optimization |
| GRPO | Yes | Yes | Group Relative Policy Optimization |
| GSPO | Yes | Yes | Group Sequential Policy Optimization |
| DAPO | Yes | Yes | Decouple Clip and Dynamic sAmpling Policy Optimization |
| Dr. GRPO | Yes | Yes | GRPO Done Right (unbiased variant) |
| GBMPO | Yes | No | Group-Based Mirror Policy Optimization |
| Counterfactual GRPO | Yes | Yes | Counterfactual GRPO variant |
| PACE | Yes | Yes | Baseline-Optimized Learning Technique |
Installation
# Or install from source
git clone https://github.com/Lexsi-Labs/aligntune.git
cd aligntune
pip install -e .
Requirements
- Python 3.12+
- PyTorch 2.0+
- CUDA-compatible GPU (recommended for faster training)
Demo Notebooks
Interactive Colab notebooks demonstrating various AlignTune workflows: Here are the organized tables containing the Colab links, models, and datasets provided in your text.
Supervised Fine-Tuning (SFT)
Reinforcement Learning (RL)
Documentation
- Getting Started: Installation, setup, and basic usage
- User Guide: In-depth tutorials for SFT and RL training
- API Reference: Complete Python API and class/method details
- Examples: End-to-end code examples
- Advanced Topics: Architecture, custom backends, and performance optimization
- Notebooks: Interactive Colab notebooks and local Jupyter notebooks
Key Capabilities
- Multiple Training Paradigms: Supports SFT, DPO, PPO, GRPO, and advanced RL algorithms
- Backend Flexibility: TRL and Unsloth backends with automatic fallback
- Reward Model Training: Train custom reward models from rule-based functions
- Comprehensive Evaluation: Multi-level evaluation with lm-eval integration
- Production Ready: Model serialization, reproducible training, and deployment-ready pipelines
- Extensible Architecture: Modular design for easy integration of custom algorithms and backends
Architecture
AlignTune uses a flexible backend architecture:
flowchart TD
Factory[Backend Factory] --> TRL[TRL Backend]
Factory --> Unsloth[Unsloth Backend]
TRL --> TRL_Algos[TRL Algorithms]
Unsloth --> Unsloth_Algos[Unsloth Algorithms]
TRL Backend: SFT, DPO, PPO, GRPO, GSPO, DAPO, Dr. GRPO, GBMPO, Counterfactual GRPO, PACE
Unsloth Backend: SFT, DPO, PPO, GRPO, DAPO, Dr. GRPO, Counterfactual GRPO, PACE
See Architecture for details.
Contributing
We welcome contributions! See our Contributing Guide for details.
License
This project is released under the MIT License. Please cite appropriately if used in academic or production projects. See the LICENSE file for details.
Key Points:
- Free for Research & Learning: Use, modify, and study for personal, academic, or research purposes
- Source Available: Full access to source code
- Commercial Use Restricted: Requires separate commercial license
- Contact: For commercial licensing, partnership, or redistribution rights, contact support@lexsi.ai
This is not an open-source license as defined by OSI, but provides broad access for non-commercial use.
Citation
If you use AlignTune in your research, please cite:
BibTeX:
@software{alignTune2025,
title = {{AlignTune}: Modular Toolkit for Post-Training Alignment of Large Language Models},
author = {Lyngkhoi, R E Zera Marveen and Chawla, Chirag and Seth, Pratinav and Avaiya, Utsav and Bhattacharjee, Soham and Khandoga, Mykola and Yuan, Rui and Sankarapu, Vinay Kumar},
year = {2025},
note = {Equal contribution: R E Zera Marveen Lyngkhoi, Chirag Chawla, Pratinav Seth},
organization = {Lexsi Labs},
url = {https://github.com/Lexsi-Labs/aligntune},
version = {0.0.0}
}
Plain Text:
Lyngkhoi, R. E. Z. M., Chawla, C., Seth, P., Avaiya, U., Bhattacharjee, S., Khandoga, M.,
Yuan, R., & Sankarapu, V. K. (2025). AlignTune: Modular Toolkit for Post-Training Alignment
of Large Language Models. Lexsi Labs. https://github.com/Lexsi-Labs/aligntune
*Equal contribution: R E Zera Marveen Lyngkhoi, Chirag Chawla, Pratinav Seth
Acknowledgments
AlignTune is built upon the excellent work of the following projects:
- HuggingFace Transformers - Model architectures and tokenizers
- TRL - Transformer Reinforcement Learning library
- Unsloth - Fast and memory-efficient training
- HuggingFace Datasets - Dataset loading and processing
Support
- Documentation: aligntune.lexsi.ai/
- GitHub Issues: github.com/Lexsi-Labs/aligntune/issues
- Discussions: github.com/Lexsi-Labs/aligntune/discussions
- Email: hello@lexsi.ai
- Discord: Discord Lexsi Labs
Get started with AlignTune and accelerate your LLM fine-tuning workflows today!
Contact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aligntune-0.1.7.tar.gz.
File metadata
- Download URL: aligntune-0.1.7.tar.gz
- Upload date:
- Size: 669.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67469c1f2cc875fde5ee0514fa8e78b8564f2a433082173e5d4e67f2de00bb7b
|
|
| MD5 |
09524c72cbf8264a0e3200cd2c0efb32
|
|
| BLAKE2b-256 |
d82f3270eaf0ddf7292c571b6e9d83f1ddf95363df85ae745428ba985c893a7e
|
File details
Details for the file aligntune-0.1.7-py3-none-any.whl.
File metadata
- Download URL: aligntune-0.1.7-py3-none-any.whl
- Upload date:
- Size: 745.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1660090479e2595f0d016a798865dde7c51bdc7e304e09455e3ba5f056502f9c
|
|
| MD5 |
9641dc37377ba827a42c4fae388b67b5
|
|
| BLAKE2b-256 |
7c93894cea4f25210709c4b5fb52b7a6df5ce8ce2f1d6ad555854fe602613da1
|