Skip to main content

Impelementation of Leeroo LLM composer.

Project description

Mergoo Leeroo logo

made-with-python License: LPGLv3.0 Version

mergoo is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.

🚀 Features

  • Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging
  • Flexible merging for each layer
  • Base Models supported : Llama(including LLaMa3), Mistral, Phi3, and BERT
  • Trainers supported : 🤗 Trainer, SFTrainer, PEFT
  • Device Supported: CPU, MPS, GPU
  • Training choices: Only Router of MoE layers, or Fully fine-tuning of Merged LLM

If you like the project, consider leaving a ⭐️


Install by pip:

pip install mergoo

Install latest unstable version on Github:

pip install git+

Install it from the source:

git clone
cd mergoo
pip install -e .

Quick Start

Configuration Setup

Specify the config for merging:

  • model_type: type of base model. choices: mistral, llama, or bert.
  • num_experts_per_token: Number of experts for each token of MoE.
  • experts: config for experts to merge. includes expert_name and Hugging Face 🤗model_id.
  • router_layers: layers chosen for applying Mixture-of-Experts.

Fully Fine-tuned Experts

This is a sample config when merging fully fine-tuned LLM experts.

config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "experts": [
        {"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
        {"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
        {"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
    "router_layers": ["gate_proj", "up_proj", "down_proj"]

For the above example, we merged math and code mistral-based experts. Please refer to this notebook for further details!

Mixture of Adapters (MoE on LoRA)

This is a sample config when merging LoRA fine-tuned LLM experts. mergoo builds a routing layer on top of LoRAs, resulting in a mixture of adapters.

config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "base_model": "mistralai/Mistral-7B-v0.1",
    "experts": [
        {"expert_name": "adapter_1", "model_id": "predibase/customer_support"},
        {"expert_name": "adapter_2", "model_id": "predibase/customer_support_accounts"},
        {"expert_name": "adapter_3", "model_id": "predibase/customer_support_orders"},
        {"expert_name": "adapter_4", "model_id": "predibase/customer_support_payments"}

The expert_name starts with adapter instead of expert. Please refer to this notebook for further details!

Merge Experts

Following the config setup, mergoo creates the merged LLM as:

import torch
from mergoo.compose_experts import ComposeExperts

# create checkpoint
model_id = "data/mistral_lora_moe"
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)

Load / Finetune Merged Expert

Now, you can easily train the merged LLM with Hugging Face Trainer:

from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM

model = MistralForCausalLM.from_pretrained("data/mistral_lora_moe") 
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them

trainer = Trainer( ... )

📚 Learn More:

After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo.

Notebook Details
MoE with fully fine-tuned LLM experts Build a unifined Mixture-of-Experts model with fully fine-tuned experts. Inspired by BTX Research (Meta AI).
MoE with LoRA fine-tuned experts Build a Mixture of Adaptes expert. Inspired by xlora | Mixture-of-LoRAs | MoLE | PHATGOOSE | MoELoRA
Hugging Face Blog Deep dive into research details behind the merging methods of mergoo library
LLaMa3-based Experts Build your own MoE-style LLM experts by integrating LLaMa3-based domain experts
Phi3-based Experts Create MoE-style LLM architecture by merging Phi3-based fine-tuned models

Mergoo Roadmap and Contributing

As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.

Here is mergoo roadmap:

  • Support MoE for Transformer Block
  • Compatibility with Huggingface 🤗
  • Support Trainer, SFTrainer
  • Loading Unified Checkpoint in BTX
  • Feature: Convertible QKV linear layers
  • Feature: Convertible FF linear layers
  • Feature: Routers only for a list of decoder layers indexes
  • Sharded Safetensor Saving
  • Support experts based on LLaMa and Mistral
  • Support experts based on Phi3
  • Support Mixture of LORA Experts (Mixture of Adapters)
  • Router Load balancing loss
  • Lazy loading of tensors for low memory usage in Merging
  • Support other Layer-wise merging methods, including Mergekit
  • Support experts based on Gemma and Mamba
  • Support flash-attention
  • Support Mixture of Depths Transformer

Feel free to suggest new features and/or contribute to mergoo roadmap!

Join our community!

🚀 We love to here your feedback, please join Leeroo community:

Have a question not listed here? Open a GitHub Issue or send us an email!

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mergoo-0.0.10.tar.gz (82.2 kB view hashes)

Uploaded Source

Built Distribution

mergoo-0.0.10-py3-none-any.whl (84.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page