Skip to main content

AlignTune: Multi-backend alignment and fine-tuning library. Features TRL and Unsloth backends with complete RL coverage (DPO, PPO, GRPO, BOLT), 27+ reward functions, production-ready evaluation system, and unified configuration interface for LLM alignment.

Project description

AlignTune Banner


AlignTune is a production-ready fine-tuning library designed to simplify training and fine-tuning of Large Language Models (LLMs) with both Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) methods. It provides a high-level, unified API that abstracts away the complexities of backend selection, algorithm configuration, and training loops, letting you focus on delivering results.

Core Features

Multi-Backend Architecture: Choose between TRL (reliable, battle-tested) and Unsloth (faster) backends with intelligent auto-selection.

Complete RLHF Coverage: 12+ RL algorithms including DPO, PPO, GRPO, GSPO, DAPO, Dr. GRPO, GBMPO, Counterfactual GRPO, and PACE.

Production-Ready: No mock code, comprehensive error handling, extensive testing, and robust validation.

Quick Start

Supervised Fine-Tuning (SFT)

from aligntune.core.backend_factory import create_sft_trainer

# Create and train SFT model
trainer = create_sft_trainer(
    model_name="microsoft/DialoGPT-small",
    dataset_name="tatsu-lab/alpaca",
    backend="trl",
    num_epochs=3,
    max_steps = -1,
    batch_size=4,
    learning_rate=5e-5
)

# Train the model
trainer.train()

# Evaluate
metrics = trainer.evaluate()
print(metrics)

Reinforcement Learning (DPO)

from aligntune.core.backend_factory import create_rl_trainer

# Create and train DPO model
trainer = create_rl_trainer(
    model_name="Qwen/Qwen3-0.6B",
    dataset_name="Anthropic/hh-rlhf",
    algorithm="dpo",
    backend="trl",
    num_epochs=1,
    batch_size=4,
    learning_rate=5e-5
)

# Train the model
trainer.train()

Supported Algorithms

Algorithm TRL Backend Unsloth Backend Description
SFT Yes Yes Supervised Fine-Tuning
DPO Yes Yes Direct Preference Optimization
PPO Yes Yes Proximal Policy Optimization
GRPO Yes Yes Group Relative Policy Optimization
GSPO Yes Yes Group Sequential Policy Optimization
DAPO Yes Yes Decouple Clip and Dynamic sAmpling Policy Optimization
Dr. GRPO Yes Yes GRPO Done Right (unbiased variant)
GBMPO Yes No Group-Based Mirror Policy Optimization
Counterfactual GRPO Yes Yes Counterfactual GRPO variant
PACE Yes Yes Baseline-Optimized Learning Technique

Installation

# Or install from source
git clone https://github.com/Lexsi-Labs/aligntune.git
cd aligntune
pip install -e .

Requirements

  • Python 3.12+
  • PyTorch 2.0+
  • CUDA-compatible GPU (recommended for faster training)

Demo Notebooks

Interactive Colab notebooks demonstrating various AlignTune workflows: Here are the organized tables containing the Colab links, models, and datasets provided in your text.

Supervised Fine-Tuning (SFT)

Backend Model Dataset Link
TRL Qwen/Qwen3-4B-Instruct-2507 sohamb37lexsi/bitext-wealth-management-llm-chatbot-splits Open In Colab
TRL Qwen3-4B-Instruct sohamb37lexsi/bitext-retail-banking-llm-chatbot-splits Open In Colab
Unsloth Qwen/Qwen2.5-0.5B-Instruct bebechien/MobileGameNPC Open In Colab
TRL google/txgemma-2b-predict trialbench_adverse-event-rate-prediction Open In Colab
Unsloth Qwen/Qwen2.5-0.5B-Instruct bebechien/MobileGameNP Open In Colab

Reinforcement Learning (RL)

Backend Algorithm Model Dataset Link
Unsloth DPO microsoft/phi-2 argilla/distilabel-intel-orca-dpo-pairs Open In Colab
TRL DPO google/gemma-2-2b-it Anthropic/hh-rlhf Open In Colab
TRL DPO sohamb37lexsi/wealth_management_Qwen3-4B-Instruct-2507 sohamb37lexsi/bitext_wealth_management_preference_data Open In Colab
Unsloth PPO Qwen/Qwen2.5-0.5B-Instruct HuggingFaceH4/ultrachat_200k Open In Colab
TRL PPO EleutherAI/pythia-1.4b CarperAI/openai_summarize_tldr Open In Colab
TRL GRPO (Coding) Qwen/Qwen3-4B google-research-datasets/mbpp Open In Colab
Unsloth GRPO (Math) meta-llama/Llama-3.2-3B-Instruct openai/gsm8k Open In Colab
TRL GRPO meta-llama/Llama-3.2-3B-Instruct openai/gsm8k Open In Colab
Unsloth DRGRPO Qwen/Qwen2.5-3B-Instruct yahma/alpaca-cleaned Open In Colab
TRL DRGRPO Qwen/Qwen2-0.5B-Instruct AI-MO/NuminaMath-TIR Open In Colab
Unsloth GSPO Qwen/Qwen3-1.7B CyberNative/Code_Vulnerability_Security_DPO Open In Colab
TRL GSPO meta-llama/Llama-3.2-3B-Instruct HuggingFaceH4/ultrachat_200k Open In Colab
Unsloth DAPO microsoft/Phi-3.5-mini-instruct HuggingFaceH4/ultrachat_200k Open In Colab
TRL DAPO meta-llama/Llama-3.2-3B-Instruct google-research-datasets/mbpp Open In Colab

Documentation

Key Capabilities

  • Multiple Training Paradigms: Supports SFT, DPO, PPO, GRPO, and advanced RL algorithms
  • Backend Flexibility: TRL and Unsloth backends with automatic fallback
  • Reward Model Training: Train custom reward models from rule-based functions
  • Comprehensive Evaluation: Multi-level evaluation with lm-eval integration
  • Production Ready: Model serialization, reproducible training, and deployment-ready pipelines
  • Extensible Architecture: Modular design for easy integration of custom algorithms and backends

Architecture

AlignTune uses a flexible backend architecture:

flowchart TD
    Factory[Backend Factory] --> TRL[TRL Backend]
    Factory --> Unsloth[Unsloth Backend]
    TRL --> TRL_Algos[TRL Algorithms]
    Unsloth --> Unsloth_Algos[Unsloth Algorithms]

TRL Backend: SFT, DPO, PPO, GRPO, GSPO, DAPO, Dr. GRPO, GBMPO, Counterfactual GRPO, PACE

Unsloth Backend: SFT, DPO, PPO, GRPO, DAPO, Dr. GRPO, Counterfactual GRPO, PACE

See Architecture for details.

Contributing

We welcome contributions! See our Contributing Guide for details.

License

This project is released under the MIT License. Please cite appropriately if used in academic or production projects. See the LICENSE file for details.

Key Points:

  • Free for Research & Learning: Use, modify, and study for personal, academic, or research purposes
  • Source Available: Full access to source code
  • Commercial Use Restricted: Requires separate commercial license
  • Contact: For commercial licensing, partnership, or redistribution rights, contact support@lexsi.ai

This is not an open-source license as defined by OSI, but provides broad access for non-commercial use.

Citation

If you use AlignTune in your research, please cite:

BibTeX:

@software{alignTune2025,
  title        = {{AlignTune}: Modular Toolkit for Post-Training Alignment of Large Language Models},
  author       = {Lyngkhoi, R E Zera Marveen and Chawla, Chirag and Seth, Pratinav and Avaiya, Utsav and Bhattacharjee, Soham and Khandoga, Mykola and Yuan, Rui and Sankarapu, Vinay Kumar},
  year         = {2025},
  note         = {Equal contribution: R E Zera Marveen Lyngkhoi, Chirag Chawla, Pratinav Seth},
  organization = {Lexsi Labs},
  url          = {https://github.com/Lexsi-Labs/aligntune},
  version      = {0.0.0}
}

Plain Text:

Lyngkhoi, R. E. Z. M., Chawla, C., Seth, P., Avaiya, U., Bhattacharjee, S., Khandoga, M.,
Yuan, R., & Sankarapu, V. K. (2025). AlignTune: Modular Toolkit for Post-Training Alignment
of Large Language Models. Lexsi Labs. https://github.com/Lexsi-Labs/aligntune

*Equal contribution: R E Zera Marveen Lyngkhoi, Chirag Chawla, Pratinav Seth

Acknowledgments

AlignTune is built upon the excellent work of the following projects:

Support


Get started with AlignTune and accelerate your LLM fine-tuning workflows today!

Contact


https://www.lexsi.ai

Paris 🇫🇷 · Mumbai 🇮🇳 · London 🇬🇧

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aligntune-0.1.7.tar.gz (669.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aligntune-0.1.7-py3-none-any.whl (745.5 kB view details)

Uploaded Python 3

File details

Details for the file aligntune-0.1.7.tar.gz.

File metadata

  • Download URL: aligntune-0.1.7.tar.gz
  • Upload date:
  • Size: 669.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aligntune-0.1.7.tar.gz
Algorithm Hash digest
SHA256 67469c1f2cc875fde5ee0514fa8e78b8564f2a433082173e5d4e67f2de00bb7b
MD5 09524c72cbf8264a0e3200cd2c0efb32
BLAKE2b-256 d82f3270eaf0ddf7292c571b6e9d83f1ddf95363df85ae745428ba985c893a7e

See more details on using hashes here.

File details

Details for the file aligntune-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: aligntune-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 745.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for aligntune-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 1660090479e2595f0d016a798865dde7c51bdc7e304e09455e3ba5f056502f9c
MD5 9641dc37377ba827a42c4fae388b67b5
BLAKE2b-256 7c93894cea4f25210709c4b5fb52b7a6df5ce8ce2f1d6ad555854fe602613da1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page