Impelementation of Leeroo LLM composer.
Project description
Mergoo
mergoo
is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo
, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.
🚀 Features
- Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging
- Flexible merging for each layer
- Base Models supported : Llama(including LLaMa3), Mistral, Phi3, and BERT
- Trainers supported : 🤗 Trainer, SFTrainer, PEFT
- Device Supported: CPU, MPS, GPU
- Training choices: Only Router of MoE layers, or Fully fine-tuning of Merged LLM
If you like the project, consider leaving a ⭐️
Installation
Install by pip:
pip install mergoo
Install latest unstable version on Github:
pip install git+https://github.com/Leeroo-AI/mergoo
Install it from the source:
git clone https://github.com/Leeroo-AI/mergoo
cd mergoo
pip install -e .
Quick Start
Configuration Setup
Specify the config for merging:
model_type
: type of base model. choices:mistral
,llama
, orbert
.num_experts_per_token
: Number of experts for each token of MoE.experts
: config for experts to merge. includesexpert_name
and Hugging Face 🤗model_id
.router_layers
: layers chosen for applying Mixture-of-Experts.
Fully Fine-tuned Experts
This is a sample config when merging fully fine-tuned LLM experts.
config = {
"model_type": "mistral",
"num_experts_per_tok": 2,
"experts": [
{"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
{"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
{"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
],
"router_layers": ["gate_proj", "up_proj", "down_proj"]
}
For the above example, we merged math and code mistral-based experts. Please refer to this notebook for further details!
Mixture of Adapters (MoE on LoRA)
This is a sample config when merging LoRA fine-tuned LLM experts. mergoo
builds a routing layer on top of LoRAs, resulting in a mixture of adapters.
config = {
"model_type": "mistral",
"num_experts_per_tok": 2,
"base_model": "mistralai/Mistral-7B-v0.1",
"experts": [
{"expert_name": "adapter_1", "model_id": "predibase/customer_support"},
{"expert_name": "adapter_2", "model_id": "predibase/customer_support_accounts"},
{"expert_name": "adapter_3", "model_id": "predibase/customer_support_orders"},
{"expert_name": "adapter_4", "model_id": "predibase/customer_support_payments"}
],
}
The expert_name
starts with adapter
instead of expert
. Please refer to this notebook for further details!
Merge Experts
Following the config setup, mergoo
creates the merged LLM as:
import torch
from mergoo.compose_experts import ComposeExperts
# create checkpoint
model_id = "data/mistral_lora_moe"
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)
expertmerger.compose()
expertmerger.save_checkpoint(model_id)
Load / Finetune Merged Expert
Now, you can easily train the merged LLM with Hugging Face Trainer:
from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM
model = MistralForCausalLM.from_pretrained("data/mistral_lora_moe")
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them
trainer = Trainer( ... )
trainer.train()
📚 Learn More:
After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo
.
Notebook | Details |
---|---|
MoE with fully fine-tuned LLM experts | Build a unifined Mixture-of-Experts model with fully fine-tuned experts. Inspired by BTX Research (Meta AI). |
MoE with LoRA fine-tuned experts | Build a Mixture of Adaptes expert. Inspired by xlora | Mixture-of-LoRAs | MoLE | PHATGOOSE | MoELoRA |
Hugging Face Blog | Deep dive into research details behind the merging methods of mergoo library |
LLaMa3-based Experts | Build your own MoE-style LLM experts by integrating LLaMa3-based domain experts |
Phi3-based Experts | Create MoE-style LLM architecture by merging Phi3-based fine-tuned models |
Mergoo Roadmap and Contributing
As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.
Here is mergoo
roadmap:
- Support MoE for Transformer Block
- Compatibility with Huggingface 🤗
- Support Trainer, SFTrainer
- Loading Unified Checkpoint in BTX
- Feature: Convertible QKV linear layers
- Feature: Convertible FF linear layers
- Feature: Routers only for a list of decoder layers indexes
- Sharded Safetensor Saving
- Support experts based on LLaMa and Mistral
- Support experts based on Phi3
- Support Mixture of LORA Experts (Mixture of Adapters)
- Router Load balancing loss
- Lazy loading of tensors for low memory usage in Merging
- Support other Layer-wise merging methods, including Mergekit
- Support experts based on Gemma and Mamba
- Support flash-attention
- Support Mixture of Depths Transformer
Feel free to suggest new features and/or contribute to mergoo
roadmap!
Join our community!
🚀 We love to here your feedback, please join Leeroo community:
Have a question not listed here? Open a GitHub Issue or send us an email!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mergoo-0.0.10.tar.gz
.
File metadata
- Download URL: mergoo-0.0.10.tar.gz
- Upload date:
- Size: 82.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47d300b24df0c0b2dc3589ee95b6a63515c835fb0877651ef0d4678cb9c68a89 |
|
MD5 | 6d38b0cab8dd504f8600cc8d761a25b4 |
|
BLAKE2b-256 | 7e6f473464a146ea6c08e72876c1f04b6681be7d3da4a6844f63b73510c1d56f |
File details
Details for the file mergoo-0.0.10-py3-none-any.whl
.
File metadata
- Download URL: mergoo-0.0.10-py3-none-any.whl
- Upload date:
- Size: 84.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 452d39875577710a11abd579b8233e0d436c7a3cfffdbade2bb5773de56aaf86 |
|
MD5 | 04581e58c99b490d729a45ade58bf944 |
|
BLAKE2b-256 | 6405f231cc63d6033193d92f94965317212cc33379753f15f1e511f5c546a610 |