Fuse multiple LoRA adapters into Mixture-of-Experts systems. Train → Compose → Serve.
Project description
ConcAdptr
Concocting Adapters — Brew multiple LoRA adapters into Mixture-of-Experts systems.
Train → Concoct → Serve.
ConcAdptr takes independently trained LoRA adapters and concocts them into MoE-style expert systems with learned routing. Model-agnostic. Privacy-preserving. Built for production.
The Problem
You fine-tune a base model with LoRA for your product. Then each customer or user-group needs their own specialization — but they can't share their data. You end up with multiple LoRA adapters trained in isolation. How do you combine them into something smarter than any individual adapter?
The Solution
ConcAdptr takes your independently trained LoRA adapters and concocts them into experts within a Mixture-of-Experts system. A lightweight router learns which expert(s) to activate for each input — without needing access to any customer's original training data.
Base Model ─┬─ LoRA Adapter A (medical) ──┐
├─ LoRA Adapter B (legal) ──┼── Router ── Concocted Output
└─ LoRA Adapter C (finance) ──┘
Key Features
- Model-agnostic — Works with any HuggingFace transformer (Qwen, LLaMA, Mistral, Gemma, etc.)
- Privacy-preserving — Customer data never leaves their environment; only adapters travel
- 3 routing strategies — Soft merging (MoLoRA), Top-K sparse routing (MixLoRA), X-LoRA learned scaling
- Static merging fallback — Linear, TIES, and DARE merging when routing overhead is undesirable
- HuggingFace Hub integration — Push/pull adapters and full models to/from the Hub
- Full pipeline — Train adapters → Concoct with router → Serve — one library
- Production-ready — FastAPI serving, adapter registry, compatibility validation
- Consumer GPU friendly — 4-bit quantization, runs on 16GB VRAM
Installation
pip install concadptr
With optional dependencies:
pip install concadptr[training] # + bitsandbytes, trl
pip install concadptr[serving] # + fastapi, uvicorn
pip install concadptr[hub] # + huggingface_hub
pip install concadptr[all] # everything
Quick Start
1. Define your configuration
from concadptr import ConcAdptrConfig
config = ConcAdptrConfig(
base_model="Qwen/Qwen2.5-7B-Instruct",
adapters={
"medical": "./adapters/medical_invoices",
"legal": "./adapters/legal_contracts",
"finance": "./adapters/financial_reports",
},
routing_strategy="xlora", # or "soft_merging", "top_k"
quantization="4bit",
)
Or load from YAML:
config = ConcAdptrConfig.from_yaml("config.yaml")
2. Build the concocted model
from concadptr import ConcAdptrModel
model = ConcAdptrModel.from_config(config)
3. Train the router
from concadptr import ConcAdptrTrainer
from datasets import load_dataset, concatenate_datasets
# Mix of domain samples — not customer data
router_dataset = concatenate_datasets([
load_dataset("medical_qa", split="train[:500]"),
load_dataset("legal_docs", split="train[:500]"),
load_dataset("finance_qa", split="train[:500]"),
])
trainer = ConcAdptrTrainer(
model=model,
train_dataset=router_dataset,
learning_rate=1e-4,
num_epochs=3,
batch_size=4,
)
trainer.train()
model.save_pretrained("./concocted_model")
4. Analyze routing patterns
from concadptr.utils import print_routing_summary
model.router.enable_history(True)
# Run some inference...
stats = model.router.get_routing_stats()
print_routing_summary(stats, expert_names=["medical", "legal", "finance"])
5. Serve
from concadptr.serving import serve
serve("./concocted_model", host="0.0.0.0", port=8000)
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "Analyze this medical invoice...", "max_tokens": 256}'
Routing Strategies
| Strategy | How It Works | Best For |
|---|---|---|
soft_merging |
Weighted average of ALL experts per token | Few experts (2-8), overlapping domains |
top_k |
Activate only top-k experts per token | Many experts (8+), distinct domains |
xlora |
Learned scaling with frozen adapters, layer-wise | Independent adapters, privacy-critical |
Static Merging (No Router)
When routing overhead is undesirable, merge adapters statically into a single PEFT adapter:
from concadptr import merge_adapters
# Linear weighted average
output = merge_adapters(
adapters={"medical": "./adapters/medical", "legal": "./adapters/legal"},
output_path="./merged",
method="linear", # "linear", "ties", "dare", "dare_ties"
weights=[0.6, 0.4],
)
# TIES — reduces interference between adapters
output = merge_adapters(adapters=..., output_path="./merged", method="ties", trim_fraction=0.2)
# DARE — stochastic drop + rescale before merging
output = merge_adapters(adapters=..., output_path="./merged", method="dare", density=0.7)
Or via the registry:
registry.merge(["medical", "legal"], output_path="./merged", method="ties")
The output is a standard PEFT adapter directory — usable with PeftModel.from_pretrained().
HuggingFace Hub
# Push a full concocted model
model.push_to_hub("username/my-concocted-model", token="hf_...")
# Load it back
model = ConcAdptrModel.from_hub("username/my-concocted-model")
# Push/pull individual adapters
registry.push_adapter_to_hub("medical", repo_id="username/medical-adapter")
registry.load_adapter_from_hub("username/medical-adapter", name="medical")
Architecture
┌──────────────────────────────────────────────┐
│ ConcAdptrModel │
│ │
│ ┌──────────┐ ┌───────────────────────────┐ │
│ │ Base │ │ Adapter Registry │ │
│ │ Model │ │ ┌─────┐ ┌─────┐ ┌─────┐ │ │
│ │ (frozen) │ │ │LoRA │ │LoRA │ │LoRA │ │ │
│ │ │ │ │ A │ │ B │ │ C │ │ │
│ │ │ │ │froze│ │froze│ │froze│ │ │
│ └──────────┘ │ └──┬──┘ └──┬──┘ └──┬──┘ │ │
│ └─────┼───────┼───────┼─────┘ │
│ │ │ │ │
│ ┌─────▼───────▼───────▼─────┐ │
│ │ Router │ │
│ │ (trainable) │ │
│ └─────────────┬─────────────┘ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ Concocted Output │ │
│ └───────────────────┘ │
└──────────────────────────────────────────────┘
The Multi-Customer Use Case
ConcAdptr was designed for a specific real-world pattern:
- You fine-tune a base model on your general training data → your product's foundation model
- Each customer fine-tunes on their private data (on-premise) → produces a LoRA adapter
- The adapter (50-200MB, no raw data) is transferred back to you
- ConcAdptr concocts all customer adapters into a MoE system with learned routing
Customer data never leaves their environment. The router learns which expert(s) to activate without seeing the original training data. This is federated expertise — cross-customer knowledge transfer without data sharing.
Project Structure
concadptr/
├── concadptr/
│ ├── __init__.py # Public API
│ ├── config.py # Configuration classes (ConcAdptrConfig, MergeConfig, ...)
│ ├── model.py # ConcAdptrModel (core)
│ ├── trainer.py # ConcAdptrTrainer (router training)
│ ├── router/
│ │ ├── base.py # BaseRouter ABC
│ │ ├── soft_merging.py # Dense/soft routing (MoLoRA)
│ │ ├── top_k.py # Sparse top-k routing (MixLoRA)
│ │ └── xlora.py # Learned scaling (X-LoRA)
│ ├── adapters/
│ │ └── __init__.py # AdapterRegistry
│ ├── merging/
│ │ ├── __init__.py # merge_adapters() functional API
│ │ ├── base.py # AdapterMerger ABC
│ │ ├── linear.py # Weighted average
│ │ ├── ties.py # TIES (Trim, Elect Sign, Merge)
│ │ ├── dare.py # DARE (Drop And REscale)
│ │ └── utils.py # Weight loading utilities
│ ├── serving/
│ │ └── server.py # FastAPI inference server
│ └── utils/
│ └── visualization.py # Routing analysis tools
├── tests/
├── examples/
│ └── config.yaml
├── pyproject.toml
└── README.md
Development
git clone https://github.com/irfanalii/concadptr.git
cd concadptr
pip install -e ".[dev]"
pytest
Roadmap
- Core library architecture
- 3 routing strategies (soft, top-k, X-LoRA)
- Per-layer routing hooks (2-pass forward with LoRA delta weighting)
- Adapter registry with compatibility validation
- Router training pipeline
- FastAPI serving
- Routing visualization and analysis
- Static merging — Linear, TIES, DARE, DARE+TIES
- HuggingFace Hub push/pull (models and adapters)
- Hook per-layer routing into the generation loop
- vLLM integration for high-throughput serving
- Benchmarking suite across model families (Qwen2.5, LLaMA 3.1, Mistral)
- Adapter version metadata and progressive merging pipeline
- Federated LoRA training (FedAvg on adapter weights)
References
- Hu et al. (2021) — LoRA: Low-Rank Adaptation of Large Language Models
- Zadouri et al. (2023) — Pushing Mixture of Experts to the Limit (MoLoRA)
- Yadav et al. (2023) — TIES-Merging: Resolving Interference When Merging Models
- Yu et al. (2023) — Language Models are Super Mario (DARE)
- Wu et al. (2024) — Mixture of LoRA Experts (MoLE)
- Li et al. (2024) — MixLoRA
- Buehler & Buehler (2024) — X-LoRA: Mixture of Low-Rank Adapter Experts
License
Apache 2.0 — see LICENSE for details.
Author
Irfan Ali — GitHub · HuggingFace
ConcAdptr — because the best models are concocted, not just trained.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file concadptr-0.2.0.tar.gz.
File metadata
- Download URL: concadptr-0.2.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fe0f2e6b168d19de70b7f4d66e9d86a42a2c73faaf156a59461063130cf61af
|
|
| MD5 |
140e9edb3080476356af5701dbf6ef6e
|
|
| BLAKE2b-256 |
031659e798e3e2d8927fecdab667adb63c06325f946d8ac9c1eb1c62c68330d7
|
File details
Details for the file concadptr-0.2.0-py3-none-any.whl.
File metadata
- Download URL: concadptr-0.2.0-py3-none-any.whl
- Upload date:
- Size: 42.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a400c3192a5a22746437a991946362cd1fbb330a65b9c1b2351e88344b581140
|
|
| MD5 |
ed956c04cbe2a16b8565924c09016c9d
|
|
| BLAKE2b-256 |
2b9abf8569bfa7ca24ee609f611b2dee1b0b4848ab73acd36929cb0e4443cbc2
|