Skip to main content

Composable Expert Layer MoE primitives with hierarchical routing, fusion, and multi-loss orchestration.

Project description

celmoe-vp

celmoe-vp is a domain-agnostic library for building hierarchical expert systems with explicit routing, learned fusion, and multi-scope loss composition.

PyPI package name:

pip install celmoe-vp

Import name:

import celmoe

This package is not morphology-specific. It is the reusable architectural layer underneath higher-level applications such as morphoformer.

Core idea

CELMoE stands for a composable expert-layer MoE design where the framework gives you mechanisms rather than a task-specific network:

  • hierarchical expert registration
  • per-sample routing across levels
  • optional gradient isolation between parent and child experts
  • learned fusion of level outputs
  • an API for global, per-level, and bridge losses

You bring the actual expert implementation.

Main concepts

Configuration objects

The public config layer is built from dataclasses:

  • ExpertConfig
  • HierarchyLevelConfig
  • CELMoEConfig

ExpertConfig describes a single expert and carries:

  • name
  • loss_weight
  • stop_gradient_from_parent
  • metadata

HierarchyLevelConfig describes a level such as universal, family, language, vision, region, or any other hierarchy you want.

CELMoEConfig defines:

  • hidden_size
  • levels
  • use_learned_fusion
  • fusion_dropout

Expert contract

Experts implement the ExpertModule interface:

from celmoe import ExpertBatch, ExpertModule, ExpertResult

class MyExpert(ExpertModule):
    def forward(self, batch: ExpertBatch) -> ExpertResult:
        return ExpertResult(hidden=batch.hidden)

ExpertBatch contains:

  • hidden
  • context
  • mask
  • state

ExpertResult contains:

  • hidden
  • losses
  • aux

That keeps the framework generic enough for sequence models, classifiers, multimodal systems, or other expert topologies.

Hierarchical execution

HierarchicalCELMoE runs enabled levels in order, applies the routed expert for each sample, collects level-local losses, and fuses the resulting hidden states.

The default fusion module is LearnedFusion.

Routing format:

{
    "universal": ["core", "core", "core"],
    "family": ["slavic", "romance", "slavic"],
    "language": ["rus", "spa", "bul"],
}

Each list is batch-aligned. If a level has only one expert, CELMoE can auto-broadcast it. If a level defines fallback_expert, that fallback can also be broadcast automatically.

Multi-loss API

One of the main reasons this package exists is the loss interface.

CELMoELossAPI lets you register three different loss scopes:

  • global losses over the whole CELMoEOutput
  • level losses for a specific LevelOutput
  • bridge losses between parent and child levels

Public methods:

  • add_global
  • add_level
  • add_bridge
  • compute

That means you can express setups like:

  • a parent regularization loss
  • a child specialization loss
  • a shared final consistency loss
  • a bridge loss between parent and child representations

without hardcoding any task-specific assumptions inside the framework.

Quick example

from typing import Mapping

import torch
import torch.nn as nn

from celmoe import (
    CELMoEConfig,
    CELMoELossAPI,
    ExpertBatch,
    ExpertConfig,
    ExpertModule,
    ExpertResult,
    HierarchicalCELMoE,
    HierarchyLevelConfig,
)


class LinearExpert(ExpertModule):
    def __init__(self, hidden_size: int) -> None:
        super().__init__()
        self.proj = nn.Linear(hidden_size, hidden_size)

    def forward(self, batch: ExpertBatch) -> ExpertResult:
        hidden = self.proj(batch.hidden)
        return ExpertResult(hidden=hidden)


def build_expert(level_name: str, expert_name: str, metadata: Mapping[str, object]) -> ExpertModule:
    del level_name, expert_name, metadata
    return LinearExpert(hidden_size=256)


config = CELMoEConfig(
    hidden_size=256,
    levels=[
        HierarchyLevelConfig(
            name="parent",
            experts={"core": ExpertConfig(name="core")},
            fallback_expert="core",
        ),
        HierarchyLevelConfig(
            name="child",
            experts={
                "a": ExpertConfig(name="a", stop_gradient_from_parent=True),
                "b": ExpertConfig(name="b", stop_gradient_from_parent=True),
            },
        ),
    ],
)

model = HierarchicalCELMoE(config, expert_factory=build_expert)

hidden = torch.randn(4, 32, 256)
output = model(
    hidden,
    routing={
        "parent": ["core"] * 4,
        "child": ["a", "b", "a", "b"],
    },
)

loss_api = CELMoELossAPI()
loss_api.add_global("l2", lambda out, targets, context: out.fused_hidden.pow(2).mean())
bundle = loss_api.compute(output)
print(bundle.total)

Data structures

Important public objects:

  • LevelOutput
  • CELMoEOutput
  • LossTerm
  • LossBundle

LossBundle is especially useful when you want both:

  • a differentiable total loss for backpropagation
  • a flat scalar dictionary for logs and dashboards

Gradient isolation

Per-expert gradient isolation is controlled through stop_gradient_from_parent.

When enabled, the hidden state passed to that expert is detached before the expert runs. This is useful when you want:

  • hierarchical specialization
  • reduced interference across levels
  • staged training or freeze/unfreeze workflows

Good use cases

celmoe-vp fits well when you have:

  • multilingual systems with language or family experts
  • domain experts on top of a shared trunk
  • regional or product-specific experts
  • curriculum setups with parent/child objectives
  • research code that needs explicit loss composition

What it intentionally does not include

This package does not prescribe:

  • embeddings
  • tokenization
  • Transformer block internals
  • data loading
  • optimizer logic
  • checkpointing

Those belong in adjacent packages. celmoe-vp is the orchestration layer, not the full stack.

Typing and packaging

The package is strictly typed and ships py.typed.

That matters because CELMoE is meant to be published and consumed as an independent package, not only as a local monorepo internal. Downstream packages can rely on its public dataclasses and expert contracts without importing project-specific code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celmoe_vp-1.1.3.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celmoe_vp-1.1.3-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file celmoe_vp-1.1.3.tar.gz.

File metadata

  • Download URL: celmoe_vp-1.1.3.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.3.tar.gz
Algorithm Hash digest
SHA256 38932db60d6e8197a0b4aa536b7aae51e73c56b5360073d15f8b5a36f12e531f
MD5 0e846f8950dc619bfabe1faa63ecc00a
BLAKE2b-256 0150c4bcd84e4736629921726a8f1213316c3ce6f8e3eb0ed771644f92c07d04

See more details on using hashes here.

File details

Details for the file celmoe_vp-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: celmoe_vp-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e3541e7a2a33e8bd22fb537f0b8670a4248933ae388254907080ae898e07169d
MD5 e259bf2127cf0a8a8eb23f4d45f4c1aa
BLAKE2b-256 ad15a00a01741b9b22c2a6a0c44dca243e73fc2d9300d89abcd180512796b9ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page