Skip to main content

Impelementation of Leeroo LLM composer.

Project description

Mergoo Leeroo logo

made-with-python License: LPGLv3.0 PyPI version mergoo

mergoo is a library for easily merging multiple LLM experts, and efficiently train the merged LLM. With mergoo, you can efficiently integrate the knowledge of different generic or domain-based LLM experts.

🚀 Features

  • Supports recent merging methods including Mixture-of-Experts and Layer-wise merging
  • Flexible merging choice for each layer
  • Base Models supported : Llama and Mistral
  • Trainers supported : 🤗 Trainer, SFTrainer
  • Device Supported: CPU, MPS, GPU
  • Training choices: Finetune Only Router of MoE layers, Fully fine-tuning of Merged LLM

Installation

Install by pip:

pip install mergoo

Install latest unstable version on Github:

pip install git+https://github.com/Leeroo-AI/mergoo

Install it from the source:

git clone https://github.com/Leeroo-AI/mergoo
cd mergoo
pip install -e .

Quick Start

Merging Models
A sample usage of config and create the merged model

import torch
from mergoo.compose_experts import ComposeExperts

model_id = "data/mistral-math-code-moe"
config = {
    "model_type": "mistral",
    "num_experts_per_tok": 2,
    "experts": [
        {"expert_name": "base_expert", "model_id": "mistralai/Mistral-7B-v0.1"},
        {"expert_name": "expert_1", "model_id": "meta-math/MetaMath-Mistral-7B"},
        {"expert_name": "expert_2", "model_id": "ajibawa-2023/Code-Mistral-7B"}
    ],
    "router_layers": ["gate_proj", "up_proj", "down_proj"]
}

# create checkpoint
expertmerger = ComposeExperts(config, torch_dtype=torch.float16)
expertmerger.compose()
expertmerger.save_checkpoint(model_id)

Loading / Finetunning Merged models

from transformers import Trainer
from mergoo.models.modeling_mistral import MistralForCausalLM

model = MistralForCausalLM.from_pretrained("data/mistral-math-code-moe") 
# NOTE: 'gate' / router layers are untrained hence weight loading warning would appeare for them

trainer = Trainer( ... )
trainer.train()

📚 Learn More:

After finishing the Quick Start guide, you can explore the tutorials below to further familiarize yourself with mergoo.

Notebook Details
Unified MoE with Domain Experts Build a unifined Mixture-of-Experts model with domain-based LLM experts, inspired by BTX Research.

Mergoo Roadmap and Contributing

As an open-source library in a fast evolving domain, we welcome contributions, whether it is introducing new features, enhancing infrastructure, or improving documentation.

Here is mergoo roadmap:

  • Support MoE for Transformer Block
  • Compatibility with Huggingface 🤗
  • Support Trainer, SFTrainer
  • Loading Unified Checkpoint in BTX
  • Feature: Convertible QKV linear layers
  • Feature: Convertible FF linear layers
  • Feature: Routers only for a list of decoder layers indexes
  • Sharded Safetensor Saving
  • Support experts based on LLaMa and Mistral
  • Router Load balancing loss
  • Lazy loading of tensors for low memory usage in Merging
  • Support Mixture of LORA Expert ( Base model with multiple trained LORAs)
  • Support Layer-wise merging, including Mergekit
  • Support experts based on Gemma and Mamba
  • Support flash-attention
  • Support Mixture of Depths Transformer

Feel free to suggest new features and/or contribute to mergoo roadmap!

Join our community!

🚀 We love to here your feedback, please join Leeroo community:

Have a question not listed here? Open a GitHub Issue or send us an email!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mergoo-0.0.5.tar.gz (18.0 kB view hashes)

Uploaded Source

Built Distribution

mergoo-0.0.5-py3-none-any.whl (15.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page