Composable Expert Layer MoE primitives with hierarchical routing, fusion, and multi-loss orchestration.
Project description
celmoe-vp
celmoe-vp is a domain-agnostic library for building hierarchical expert systems with explicit routing, learned fusion, and multi-scope loss composition.
PyPI package name:
pip install celmoe-vp
Import name:
import celmoe
This package is not morphology-specific. It is the reusable architectural layer underneath higher-level applications such as morphoformer.
Core idea
CELMoE stands for a composable expert-layer MoE design where the framework gives you mechanisms rather than a task-specific network:
- hierarchical expert registration
- per-sample routing across levels
- optional gradient isolation between parent and child experts
- learned fusion of level outputs
- an API for global, per-level, and bridge losses
You bring the actual expert implementation.
Main concepts
Configuration objects
The public config layer is built from dataclasses:
ExpertConfigHierarchyLevelConfigCELMoEConfig
ExpertConfig describes a single expert and carries:
nameloss_weightstop_gradient_from_parentmetadata
HierarchyLevelConfig describes a level such as universal, family, language, vision, region, or any other hierarchy you want.
CELMoEConfig defines:
hidden_sizelevelsuse_learned_fusionfusion_dropout
Expert contract
Experts implement the ExpertModule interface:
from celmoe import ExpertBatch, ExpertModule, ExpertResult
class MyExpert(ExpertModule):
def forward(self, batch: ExpertBatch) -> ExpertResult:
return ExpertResult(hidden=batch.hidden)
ExpertBatch contains:
hiddencontextmaskstate
ExpertResult contains:
hiddenlossesaux
That keeps the framework generic enough for sequence models, classifiers, multimodal systems, or other expert topologies.
Hierarchical execution
HierarchicalCELMoE runs enabled levels in order, applies the routed expert for each sample, collects level-local losses, and fuses the resulting hidden states.
The default fusion module is LearnedFusion.
Routing format:
{
"universal": ["core", "core", "core"],
"family": ["slavic", "romance", "slavic"],
"language": ["rus", "spa", "bul"],
}
Each list is batch-aligned. If a level has only one expert, CELMoE can auto-broadcast it. If a level defines fallback_expert, that fallback can also be broadcast automatically.
Multi-loss API
One of the main reasons this package exists is the loss interface.
CELMoELossAPI lets you register three different loss scopes:
- global losses over the whole
CELMoEOutput - level losses for a specific
LevelOutput - bridge losses between parent and child levels
Public methods:
add_globaladd_leveladd_bridgecompute
That means you can express setups like:
- a parent regularization loss
- a child specialization loss
- a shared final consistency loss
- a bridge loss between parent and child representations
without hardcoding any task-specific assumptions inside the framework.
Quick example
from typing import Mapping
import torch
import torch.nn as nn
from celmoe import (
CELMoEConfig,
CELMoELossAPI,
ExpertBatch,
ExpertConfig,
ExpertModule,
ExpertResult,
HierarchicalCELMoE,
HierarchyLevelConfig,
)
class LinearExpert(ExpertModule):
def __init__(self, hidden_size: int) -> None:
super().__init__()
self.proj = nn.Linear(hidden_size, hidden_size)
def forward(self, batch: ExpertBatch) -> ExpertResult:
hidden = self.proj(batch.hidden)
return ExpertResult(hidden=hidden)
def build_expert(level_name: str, expert_name: str, metadata: Mapping[str, object]) -> ExpertModule:
del level_name, expert_name, metadata
return LinearExpert(hidden_size=256)
config = CELMoEConfig(
hidden_size=256,
levels=[
HierarchyLevelConfig(
name="parent",
experts={"core": ExpertConfig(name="core")},
fallback_expert="core",
),
HierarchyLevelConfig(
name="child",
experts={
"a": ExpertConfig(name="a", stop_gradient_from_parent=True),
"b": ExpertConfig(name="b", stop_gradient_from_parent=True),
},
),
],
)
model = HierarchicalCELMoE(config, expert_factory=build_expert)
hidden = torch.randn(4, 32, 256)
output = model(
hidden,
routing={
"parent": ["core"] * 4,
"child": ["a", "b", "a", "b"],
},
)
loss_api = CELMoELossAPI()
loss_api.add_global("l2", lambda out, targets, context: out.fused_hidden.pow(2).mean())
bundle = loss_api.compute(output)
print(bundle.total)
Data structures
Important public objects:
LevelOutputCELMoEOutputLossTermLossBundle
LossBundle is especially useful when you want both:
- a differentiable total loss for backpropagation
- a flat scalar dictionary for logs and dashboards
Gradient isolation
Per-expert gradient isolation is controlled through stop_gradient_from_parent.
When enabled, the hidden state passed to that expert is detached before the expert runs. This is useful when you want:
- hierarchical specialization
- reduced interference across levels
- staged training or freeze/unfreeze workflows
Good use cases
celmoe-vp fits well when you have:
- multilingual systems with language or family experts
- domain experts on top of a shared trunk
- regional or product-specific experts
- curriculum setups with parent/child objectives
- research code that needs explicit loss composition
What it intentionally does not include
This package does not prescribe:
- embeddings
- tokenization
- Transformer block internals
- data loading
- optimizer logic
- checkpointing
Those belong in adjacent packages. celmoe-vp is the orchestration layer, not the full stack.
Typing and packaging
The package is strictly typed and ships py.typed.
That matters because CELMoE is meant to be published and consumed as an independent package, not only as a local monorepo internal. Downstream packages can rely on its public dataclasses and expert contracts without importing project-specific code.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file celmoe_vp-1.1.0.tar.gz.
File metadata
- Download URL: celmoe_vp-1.1.0.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
386c58e78b1910545e730cf9dff7d41e33d4e6ab7222a4acfb44fae96a773bc4
|
|
| MD5 |
376870fc660b1124bb4122d55060316e
|
|
| BLAKE2b-256 |
571972c3d12485199e2ed280fed74689ba2b38dd9ba07e306e439e35855a5a40
|
File details
Details for the file celmoe_vp-1.1.0-py3-none-any.whl.
File metadata
- Download URL: celmoe_vp-1.1.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38d1e7f90304d42a23f4c9dbcf5be9bac7d3d2eadb3d35a5f7b8b4e1d9109005
|
|
| MD5 |
6075675e829c6dd113769a8b93903462
|
|
| BLAKE2b-256 |
52e6b6707fe7484d375c832547f655b57769ccbc98fe3c09a62908996af95b12
|