Tools for merging pre-trained large language models

Project description

mergekitty

mergekitty is a toolkit for merging pre-trained language models. It uses an out-of-core approach so you can run surprisingly complex merges on modest hardware — entirely on CPU, or with as little as 8 GB of VRAM.

What's this fork?

Forked from mergekit (originally by Charles Goddard, then maintained by Arcee.ai). The original project switched to a BSL license after a ton of community contribution, then switched back to LGPL but added a CLA that lets them relicense at will. So here we are.

What changed?

A few things from upstream mergekit:

All names/imports/scripts renamed to mergekitty (find-replace and you're good)
VLM support with templated pre/post-weights (architecture files are incompatible with mergekit's)
tokenizer_source now defaults to "base"; legacy tokenizer copying is gone
nuslerp → slerp (old slerp removed). Supports both t (SLERP) and weight (NuSLERP) params
bakllama, mergekit-legacy, and mergekit-evolve removed
LoRA merging script via mergekitty-merge-lora
Switched to ruff for formatting/linting and hatch for builds

Why merge models?

Model merging is chaos magick. Done right, the result is better than any of its inputs. It's been proven repeatedly and nobody fully understands why. Ship it.

Features

Works with Llama 3, Qwen 3 (Dense & MoE), Mistral, GLM4, GPT-NeoX, BERT, and more
Tons of merge methods — arguably too many
GPU or CPU — your call
Lazy tensor loading for low memory use
Interpolated gradient parameters for fine control
Layer-stacking / "Frankenmerging" (à la Goliath, Midnight Miqu)
MoE merging and LoRA extraction

Install

# recommended — isolated tool install
uv tool install mergekitty

# or just pip
pip install mergekitty

# from source
git clone https://github.com/allura-org/mergekitty.git
cd mergekitty
pip install -e .

Usage

mergekitty-yaml path/to/config.yml ./output-model [--cuda] [--lazy-unpickle] [--allow-crimes]

Run mergekitty-yaml --help for the full list of options.

Sharing on Huggingface

mergekitty generates a README.md for your merge. Edit it, keep it as-is, whatever — then upload:

huggingface-cli login
huggingface-cli upload your_username/my-cool-model ./output-model .

Merge Configuration

Configs are YAML. The main fields:

Field	Description
`merge_method`	Which algorithm to use (see below)
`slices` / `models`	Input model definitions (mutually exclusive)
`base_model`	Base model, for methods that need one
`parameters`	Weights, densities, etc. — specifiable at multiple levels
`dtype`	Data type for the merge
`tokenizer`	Vocabulary and embedding configuration
`chat_template`	Override the output chat template

Parameters

Parameters (weight, density, etc.) can be set at four levels, most-specific wins:

slices.*.sources.parameters — per input slice
slices.*.parameters — per output slice
models.*.parameters — per input model
parameters — global fallback

Values can be scalars or interpolated gradients (a list of floats for smooth transitions across layers).

Tokenizer

Use the tokenizer field for full control, or tokenizer_source for the simple legacy behavior.

tokenizer:
  source: union          # "union", "base", or a model path
  tokens:                # optional: per-token embedding overrides
    :
      source: "chatml_model"
    <|start_header_id|>:
      source: "llama3_model"
      force: true
  pad_to_multiple_of: null

Defaults are sensible: base model embeddings win if the token exists there, single-model tokens use that model, otherwise it averages. You can override any of this per-token.

Chat Template

chat_template: "auto"    # picks the most common template from inputs
# or: "alpaca", "chatml", "llama3", "mistral", "exaone"
# or: a raw Jinja2 template string

Examples

Check examples/ for real configs.

Merge Methods

Method	`merge_method`	Multi-Model	Needs Base
Linear (Model Soups)	`linear`	✅	❌
SLERP	`slerp`	✅*	✅
Nearswap	`nearswap`	❌	✅
Task Arithmetic	`task_arithmetic`	✅	✅
TIES	`ties`	✅	✅
DARE + TIES	`dare_ties`	✅	✅
DARE + Linear	`dare_linear`	✅	✅
Passthrough	`passthrough`	❌	❌
Model Breadcrumbs	`breadcrumbs`	✅	✅
Breadcrumbs + TIES	`breadcrumbs_ties`	✅	✅
Model Stock	`model_stock`	✅	✅
DELLA	`della`	✅	✅
DELLA + Linear	`della_linear`	✅	✅
SCE	`sce`	✅	✅

* SLERP supports two to three models.

Linear

Weighted average. Simple, classic, effective.

weight — relative weighting per tensor
normalize — normalize weights across models (default: true)

SLERP

Spherical interpolation. Supports t (classic SLERP, 0 = base, 1 = other) or weight (NuSLERP-style per-tensor weighting).

nuslerp_flatten — treat tensor as flat vector vs. row/column-wise
nuslerp_row_wise — SLERP row vectors instead of column vectors

Nearswap

Interpolates between base and secondary model when similarity drops below threshold t.

Task Arithmetic

Subtract base model → get "task vectors" → merge them linearly → add base back. Great for models fine-tuned from a common ancestor. Also the mental model behind most of the fancier methods.

TIES

Task arithmetic + sparsification + sign consensus. Lets you merge more models without them stepping on each other.

density — fraction of task vector weights to keep

DARE

Random pruning with rescaling, instead of TIES's magnitude-based sparsification. Works with TIES sign consensus (dare_ties) or without (dare_linear).

Passthrough

No-op. Passes tensors through unchanged. Useful for layer-stacking / frankenmerging where you only have one input per slice.

Model Breadcrumbs

Drops both tiny and huge differences from base. Works with (breadcrumbs_ties) or without (breadcrumbs) TIES.

density — fraction of weights to keep
gamma — fraction of largest-magnitude differences to remove (paper's β)
Defaults: density: 0.9, gamma: 0.01

Model Stock

Geometric trick to compute good linear weights. Needs at least three models including a base.

DELLA

Adaptive pruning based on magnitude ranking — keeps important changes, drops the rest. Like DARE but smarter about what it prunes.

density — fraction of weights to keep
epsilon — spread of drop probabilities (range: density ± epsilon)
lambda — scaling factor for merged deltas

SCE

Selects high-variance elements, computes matrix-level weights, erases minority contributions.

select_topk — fraction of high-variance elements to retain

LoRA Extraction

Extract PEFT-compatible LoRA adapters from finetuned models:

mergekitty-extract-lora finetuned_model base_model output_path --rank=32

MoE Merging

Merge dense models into a Mixture of Experts with mergekitty-moe. See the MoE docs.

Development

Uses Hatch + uv:

uv tool install hatch
hatch test              # run tests
hatch run lint          # ruff linting
hatch run format        # ruff formatting
hatch run mergekitty-yaml examples/bio-merge.yml ./bio-merge --cuda

Citation

If you use mergekitty in research, please cite the original mergekit paper:

@inproceedings{goddard-etal-2024-arcees,
    title = "Arcee{'}s {M}erge{K}it: A Toolkit for Merging Large Language Models",
    author = "Goddard, Charles  and
      Siriwardhana, Shamane  and
      Ehghaghi, Malikeh  and
      Meyers, Luke  and
      Karpukhin, Vladimir  and
      Benedict, Brian  and
      McQuade, Mark  and
      Solawetz, Jacob",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track",
    month = nov,
    year = "2024",
    pages = "477--485",
    url = "https://aclanthology.org/2024.emnlp-industry.36",
}

Project details

Release history Release notifications | RSS feed

0.3.2rc2 pre-release

Mar 28, 2026

0.3.1.post2

Mar 23, 2026

This version

0.3.1.post1

Mar 23, 2026

0.3.1

Mar 23, 2026

0.3.0

Mar 23, 2026

0.2.3.post2

Mar 23, 2026

0.2.3.post1

Mar 23, 2026

0.2.3

Mar 23, 2026

0.2.2

Mar 22, 2026

0.2.2rc5 pre-release

Mar 22, 2026

0.2.2rc4 pre-release

Mar 22, 2026

0.2.2rc3 pre-release

Mar 22, 2026

0.2.2rc2 pre-release

Mar 22, 2026

0.2.2rc1 pre-release

Mar 22, 2026

0.2.1

Mar 22, 2026

0.2.0

Mar 22, 2026

0.1.0

May 19, 2025

0.0.8 yanked

Mar 22, 2026

Reason this release was yanked:

something got messed up with merging and i didn't realize that i had bumped up to 0.1

0.0.7

Feb 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mergekitty-0.3.1.post1.tar.gz (114.4 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mergekitty-0.3.1.post1-py3-none-any.whl (164.2 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file mergekitty-0.3.1.post1.tar.gz.

File metadata

Download URL: mergekitty-0.3.1.post1.tar.gz
Upload date: Mar 23, 2026
Size: 114.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.11.13 HTTPX/0.28.1

File hashes

Hashes for mergekitty-0.3.1.post1.tar.gz
Algorithm	Hash digest
SHA256	`bbcfe077d15d8b14a0e10eaee5005fc19e799f417f36001f838ff7e66e33c7b9`
MD5	`e53d7732dea70625bdb4557045d8de42`
BLAKE2b-256	`0e53beabe892317527b292fe5ef589674a5aeef3918893554130c9ce2b9e5f31`

See more details on using hashes here.

File details

Details for the file mergekitty-0.3.1.post1-py3-none-any.whl.

File metadata

Download URL: mergekitty-0.3.1.post1-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 164.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.16.5 cpython/3.11.13 HTTPX/0.28.1

File hashes

Hashes for mergekitty-0.3.1.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6b6c2cf6e01c10c759a5240052a8aaec473a861ceef03b45e98478b7d2bcefa0`
MD5	`447583a19245513b95a100ff3a102233`
BLAKE2b-256	`190aef38c4df52d074c0556a7c8d1a31ead55bb5627d973080afbe23adf9d798`

See more details on using hashes here.

mergekitty 0.3.1.post1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Project description

mergekitty

What's this fork?

What changed?

Why merge models?

Features

Install

Usage

Sharing on Huggingface

Merge Configuration

Parameters

Tokenizer

Chat Template

Examples

Merge Methods

Linear

SLERP

Nearswap

Task Arithmetic

TIES

DARE

Passthrough

Model Breadcrumbs

Model Stock

DELLA

SCE

LoRA Extraction

MoE Merging

Development

Citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes