Skip to main content

Composable Expert Layer MoE primitives with hierarchical routing, fusion, and multi-loss orchestration.

Project description

celmoe-vp

celmoe-vp is a domain-agnostic library for building hierarchical expert systems with explicit routing, learned fusion, and multi-scope loss composition.

PyPI package name:

pip install celmoe-vp

Import name:

import celmoe

This package is not morphology-specific. It is the reusable architectural layer underneath higher-level applications such as morphoformer.

Core idea

CELMoE stands for a composable expert-layer MoE design where the framework gives you mechanisms rather than a task-specific network:

  • hierarchical expert registration
  • per-sample routing across levels
  • optional gradient isolation between parent and child experts
  • learned fusion of level outputs
  • an API for global, per-level, and bridge losses

You bring the actual expert implementation.

Main concepts

Configuration objects

The public config layer is built from dataclasses:

  • ExpertConfig
  • HierarchyLevelConfig
  • CELMoEConfig

ExpertConfig describes a single expert and carries:

  • name
  • loss_weight
  • stop_gradient_from_parent
  • metadata

HierarchyLevelConfig describes a level such as universal, family, language, vision, region, or any other hierarchy you want.

CELMoEConfig defines:

  • hidden_size
  • levels
  • use_learned_fusion
  • fusion_dropout

Expert contract

Experts implement the ExpertModule interface:

from celmoe import ExpertBatch, ExpertModule, ExpertResult

class MyExpert(ExpertModule):
    def forward(self, batch: ExpertBatch) -> ExpertResult:
        return ExpertResult(hidden=batch.hidden)

ExpertBatch contains:

  • hidden
  • context
  • mask
  • state

ExpertResult contains:

  • hidden
  • losses
  • aux

That keeps the framework generic enough for sequence models, classifiers, multimodal systems, or other expert topologies.

Hierarchical execution

HierarchicalCELMoE runs enabled levels in order, applies the routed expert for each sample, collects level-local losses, and fuses the resulting hidden states.

The default fusion module is LearnedFusion.

Routing format:

{
    "universal": ["core", "core", "core"],
    "family": ["slavic", "romance", "slavic"],
    "language": ["rus", "spa", "bul"],
}

Each list is batch-aligned. If a level has only one expert, CELMoE can auto-broadcast it. If a level defines fallback_expert, that fallback can also be broadcast automatically.

Multi-loss API

One of the main reasons this package exists is the loss interface.

CELMoELossAPI lets you register three different loss scopes:

  • global losses over the whole CELMoEOutput
  • level losses for a specific LevelOutput
  • bridge losses between parent and child levels

Public methods:

  • add_global
  • add_level
  • add_bridge
  • compute

That means you can express setups like:

  • a parent regularization loss
  • a child specialization loss
  • a shared final consistency loss
  • a bridge loss between parent and child representations

without hardcoding any task-specific assumptions inside the framework.

Quick example

from typing import Mapping

import torch
import torch.nn as nn

from celmoe import (
    CELMoEConfig,
    CELMoELossAPI,
    ExpertBatch,
    ExpertConfig,
    ExpertModule,
    ExpertResult,
    HierarchicalCELMoE,
    HierarchyLevelConfig,
)


class LinearExpert(ExpertModule):
    def __init__(self, hidden_size: int) -> None:
        super().__init__()
        self.proj = nn.Linear(hidden_size, hidden_size)

    def forward(self, batch: ExpertBatch) -> ExpertResult:
        hidden = self.proj(batch.hidden)
        return ExpertResult(hidden=hidden)


def build_expert(level_name: str, expert_name: str, metadata: Mapping[str, object]) -> ExpertModule:
    del level_name, expert_name, metadata
    return LinearExpert(hidden_size=256)


config = CELMoEConfig(
    hidden_size=256,
    levels=[
        HierarchyLevelConfig(
            name="parent",
            experts={"core": ExpertConfig(name="core")},
            fallback_expert="core",
        ),
        HierarchyLevelConfig(
            name="child",
            experts={
                "a": ExpertConfig(name="a", stop_gradient_from_parent=True),
                "b": ExpertConfig(name="b", stop_gradient_from_parent=True),
            },
        ),
    ],
)

model = HierarchicalCELMoE(config, expert_factory=build_expert)

hidden = torch.randn(4, 32, 256)
output = model(
    hidden,
    routing={
        "parent": ["core"] * 4,
        "child": ["a", "b", "a", "b"],
    },
)

loss_api = CELMoELossAPI()
loss_api.add_global("l2", lambda out, targets, context: out.fused_hidden.pow(2).mean())
bundle = loss_api.compute(output)
print(bundle.total)

Data structures

Important public objects:

  • LevelOutput
  • CELMoEOutput
  • LossTerm
  • LossBundle

LossBundle is especially useful when you want both:

  • a differentiable total loss for backpropagation
  • a flat scalar dictionary for logs and dashboards

Gradient isolation

Per-expert gradient isolation is controlled through stop_gradient_from_parent.

When enabled, the hidden state passed to that expert is detached before the expert runs. This is useful when you want:

  • hierarchical specialization
  • reduced interference across levels
  • staged training or freeze/unfreeze workflows

Good use cases

celmoe-vp fits well when you have:

  • multilingual systems with language or family experts
  • domain experts on top of a shared trunk
  • regional or product-specific experts
  • curriculum setups with parent/child objectives
  • research code that needs explicit loss composition

What it intentionally does not include

This package does not prescribe:

  • embeddings
  • tokenization
  • Transformer block internals
  • data loading
  • optimizer logic
  • checkpointing

Those belong in adjacent packages. celmoe-vp is the orchestration layer, not the full stack.

Typing and packaging

The package is strictly typed and ships py.typed.

That matters because CELMoE is meant to be published and consumed as an independent package, not only as a local monorepo internal. Downstream packages can rely on its public dataclasses and expert contracts without importing project-specific code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celmoe_vp-1.1.2.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celmoe_vp-1.1.2-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file celmoe_vp-1.1.2.tar.gz.

File metadata

  • Download URL: celmoe_vp-1.1.2.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.2.tar.gz
Algorithm Hash digest
SHA256 ccc0f255f7dee059bca2f0e9786ae9805e12c4838133ee8e1acb9d044ce12908
MD5 d714e3e5eb9422d096032c93adc83041
BLAKE2b-256 f72b4d6bdf7d7108f1c222196eed331e001d44ebf30e4d2b09067a57cb5ecb5e

See more details on using hashes here.

File details

Details for the file celmoe_vp-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: celmoe_vp-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7b605b5ca62909966aba0219932510ed4f602aebe3476b041fb8ea56be3997b0
MD5 264291ef81dc52c2f6e3bb2c21485f81
BLAKE2b-256 8739c025e091f89fa73c7a214c242d87358c0d4000ec1ee6292478e74c8dfb68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page