Skip to main content

Composable Expert Layer MoE primitives with hierarchical routing, fusion, and multi-loss orchestration.

Project description

celmoe-vp

celmoe-vp is a domain-agnostic library for building hierarchical expert systems with explicit routing, learned fusion, and multi-scope loss composition.

PyPI package name:

pip install celmoe-vp

Import name:

import celmoe

This package is not morphology-specific. It is the reusable architectural layer underneath higher-level applications such as morphoformer.

Core idea

CELMoE stands for a composable expert-layer MoE design where the framework gives you mechanisms rather than a task-specific network:

  • hierarchical expert registration
  • per-sample routing across levels
  • optional gradient isolation between parent and child experts
  • learned fusion of level outputs
  • an API for global, per-level, and bridge losses

You bring the actual expert implementation.

Main concepts

Configuration objects

The public config layer is built from dataclasses:

  • ExpertConfig
  • HierarchyLevelConfig
  • CELMoEConfig

ExpertConfig describes a single expert and carries:

  • name
  • loss_weight
  • stop_gradient_from_parent
  • metadata

HierarchyLevelConfig describes a level such as universal, family, language, vision, region, or any other hierarchy you want.

CELMoEConfig defines:

  • hidden_size
  • levels
  • use_learned_fusion
  • fusion_dropout

Expert contract

Experts implement the ExpertModule interface:

from celmoe import ExpertBatch, ExpertModule, ExpertResult

class MyExpert(ExpertModule):
    def forward(self, batch: ExpertBatch) -> ExpertResult:
        return ExpertResult(hidden=batch.hidden)

ExpertBatch contains:

  • hidden
  • context
  • mask
  • state

ExpertResult contains:

  • hidden
  • losses
  • aux

That keeps the framework generic enough for sequence models, classifiers, multimodal systems, or other expert topologies.

Hierarchical execution

HierarchicalCELMoE runs enabled levels in order, applies the routed expert for each sample, collects level-local losses, and fuses the resulting hidden states.

The default fusion module is LearnedFusion.

Routing format:

{
    "universal": ["core", "core", "core"],
    "family": ["slavic", "romance", "slavic"],
    "language": ["rus", "spa", "bul"],
}

Each list is batch-aligned. If a level has only one expert, CELMoE can auto-broadcast it. If a level defines fallback_expert, that fallback can also be broadcast automatically.

Multi-loss API

One of the main reasons this package exists is the loss interface.

CELMoELossAPI lets you register three different loss scopes:

  • global losses over the whole CELMoEOutput
  • level losses for a specific LevelOutput
  • bridge losses between parent and child levels

Public methods:

  • add_global
  • add_level
  • add_bridge
  • compute

That means you can express setups like:

  • a parent regularization loss
  • a child specialization loss
  • a shared final consistency loss
  • a bridge loss between parent and child representations

without hardcoding any task-specific assumptions inside the framework.

Quick example

from typing import Mapping

import torch
import torch.nn as nn

from celmoe import (
    CELMoEConfig,
    CELMoELossAPI,
    ExpertBatch,
    ExpertConfig,
    ExpertModule,
    ExpertResult,
    HierarchicalCELMoE,
    HierarchyLevelConfig,
)


class LinearExpert(ExpertModule):
    def __init__(self, hidden_size: int) -> None:
        super().__init__()
        self.proj = nn.Linear(hidden_size, hidden_size)

    def forward(self, batch: ExpertBatch) -> ExpertResult:
        hidden = self.proj(batch.hidden)
        return ExpertResult(hidden=hidden)


def build_expert(level_name: str, expert_name: str, metadata: Mapping[str, object]) -> ExpertModule:
    del level_name, expert_name, metadata
    return LinearExpert(hidden_size=256)


config = CELMoEConfig(
    hidden_size=256,
    levels=[
        HierarchyLevelConfig(
            name="parent",
            experts={"core": ExpertConfig(name="core")},
            fallback_expert="core",
        ),
        HierarchyLevelConfig(
            name="child",
            experts={
                "a": ExpertConfig(name="a", stop_gradient_from_parent=True),
                "b": ExpertConfig(name="b", stop_gradient_from_parent=True),
            },
        ),
    ],
)

model = HierarchicalCELMoE(config, expert_factory=build_expert)

hidden = torch.randn(4, 32, 256)
output = model(
    hidden,
    routing={
        "parent": ["core"] * 4,
        "child": ["a", "b", "a", "b"],
    },
)

loss_api = CELMoELossAPI()
loss_api.add_global("l2", lambda out, targets, context: out.fused_hidden.pow(2).mean())
bundle = loss_api.compute(output)
print(bundle.total)

Data structures

Important public objects:

  • LevelOutput
  • CELMoEOutput
  • LossTerm
  • LossBundle

LossBundle is especially useful when you want both:

  • a differentiable total loss for backpropagation
  • a flat scalar dictionary for logs and dashboards

Gradient isolation

Per-expert gradient isolation is controlled through stop_gradient_from_parent.

When enabled, the hidden state passed to that expert is detached before the expert runs. This is useful when you want:

  • hierarchical specialization
  • reduced interference across levels
  • staged training or freeze/unfreeze workflows

Good use cases

celmoe-vp fits well when you have:

  • multilingual systems with language or family experts
  • domain experts on top of a shared trunk
  • regional or product-specific experts
  • curriculum setups with parent/child objectives
  • research code that needs explicit loss composition

What it intentionally does not include

This package does not prescribe:

  • embeddings
  • tokenization
  • Transformer block internals
  • data loading
  • optimizer logic
  • checkpointing

Those belong in adjacent packages. celmoe-vp is the orchestration layer, not the full stack.

Typing and packaging

The package is strictly typed and ships py.typed.

That matters because CELMoE is meant to be published and consumed as an independent package, not only as a local monorepo internal. Downstream packages can rely on its public dataclasses and expert contracts without importing project-specific code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celmoe_vp-1.1.4.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celmoe_vp-1.1.4-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file celmoe_vp-1.1.4.tar.gz.

File metadata

  • Download URL: celmoe_vp-1.1.4.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.4.tar.gz
Algorithm Hash digest
SHA256 3d6837936efa3730f174fccd3127ce45411c8108b7b387854572039a16322beb
MD5 8cf367e789e81bef397f0de9f2513c59
BLAKE2b-256 66605dd26fc041e076dfe255709fac5a75e6b6fdf82dbeafaa8e003f8e432daf

See more details on using hashes here.

File details

Details for the file celmoe_vp-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: celmoe_vp-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a51773c4f8c99d4e792a203a7f8a8ae837cc229a07b962aa58f9b6d65a34b196
MD5 07dfba36c3fcd5218209aed02a20a4d0
BLAKE2b-256 bb451e285fbed95f4970b4722bf15b20f308dc10964efd5c1fc7b63397dd78d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page