Impelementation of Leeroo LLM composer.
Project description
Mergoo
mergoo
is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo
, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.
🚀 Features
- Supports recent merging methods including Mixture-of-Experts and Layer-wise merging
- Flexible merging choice for each layer
- Base Models supported : Llama and Mistral
- Trainers supported : 🤗 Trainer, SFTrainer
- Device Supported: CPU, MPS, GPU
- Training choices: Finetune Only Router of MoE layers, Fully fine-tuning of Merged LLM
Installation
Install by pip:
pip install mergoo
Install latest unstable version on Github:
pip install git+https://github.com/Leeroo-AI/mergoo
Install it from the source:
git clone https://github.com/Leeroo-AI/mergoo
cd mergoo
pip install -e .
Quick Start
Merging Models
A sample usage of config and create the merged model
import torch
from mergoo.compose_experts import ComposeExperts
model_id = "data/mistral-math-code-moe"
config = {
"model_type": "mistral",
"num_experts_per_tok": 2,
"experts": [
{"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
{"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
{"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
],
"router_layers": ["gate_proj", "up_proj", "down_proj"]
}
# create checkpoint
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)
expertmerger.compose()
expertmerger.save_checkpoint(model_id)
Loading / Finetunning Merged models
from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM
model = MistralForCausalLM.from_pretrained("data/mistral-math-code-moe")
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them
trainer = Trainer( ... )
trainer.train()
📚 Learn More:
After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo
.
Notebook | Details |
---|---|
Unified MoE with Domain Experts | Build a unifined Mixture-of-Experts model with domain-based LLM experts, inspired by BTX Research. |
Mergoo Roadmap and Contributing
As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.
Here is mergoo
roadmap:
- Support MoE for Transformer Block
- Compatibility with Huggingface 🤗
- Support Trainer, SFTrainer
- Loading Unified Checkpoint in BTX
- Feature: Convertible QKV linear layers
- Feature: Convertible FF linear layers
- Feature: Routers only for a list of decoder layers indexes
- Sharded Safetensor Saving
- Support experts based on LLaMa and Mistral
- Router Load balancing loss
- Lazy loading of tensors for low memory usage in Merging
- Support Mixture of LORA Expert ( Base model with multiple trained LORAs)
- Support Layer-wise merging, including Mergekit
- Support experts based on Gemma and Mamba
- Support flash-attention
- Support Mixture of Depths Transformer
Feel free to suggest new features and/or contribute to mergoo
roadmap!
Join our community!
🚀 We love to here your feedback, please join Leeroo community:
Have a question not listed here? Open a GitHub Issue or send us an email!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.