Composable Expert Layer MoE primitives with hierarchical routing, fusion, and multi-loss orchestration.

These details have not been verified by PyPI

Project description

celmoe-vp

celmoe-vp is a domain-agnostic library for building hierarchical expert systems with explicit routing, learned fusion, and multi-scope loss composition.

PyPI package name:

pip install celmoe-vp

Import name:

import celmoe

This package is not morphology-specific. It is the reusable architectural layer underneath higher-level applications such as morphoformer.

Core idea

CELMoE stands for a composable expert-layer MoE design where the framework gives you mechanisms rather than a task-specific network:

hierarchical expert registration
per-sample routing across levels
optional gradient isolation between parent and child experts
learned fusion of level outputs
an API for global, per-level, and bridge losses

You bring the actual expert implementation.

Main concepts

Configuration objects

The public config layer is built from dataclasses:

ExpertConfig
HierarchyLevelConfig
CELMoEConfig

ExpertConfig describes a single expert and carries:

name
loss_weight
stop_gradient_from_parent
metadata

HierarchyLevelConfig describes a level such as universal, family, language, vision, region, or any other hierarchy you want.

CELMoEConfig defines:

hidden_size
levels
use_learned_fusion
fusion_dropout

Expert contract

Experts implement the ExpertModule interface:

from celmoe import ExpertBatch, ExpertModule, ExpertResult

class MyExpert(ExpertModule):
    def forward(self, batch: ExpertBatch) -> ExpertResult:
        return ExpertResult(hidden=batch.hidden)

ExpertBatch contains:

hidden
context
mask
state

ExpertResult contains:

hidden
losses
aux

That keeps the framework generic enough for sequence models, classifiers, multimodal systems, or other expert topologies.

Hierarchical execution

HierarchicalCELMoE runs enabled levels in order, applies the routed expert for each sample, collects level-local losses, and fuses the resulting hidden states.

The default fusion module is LearnedFusion.

Routing format:

{
    "universal": ["core", "core", "core"],
    "family": ["slavic", "romance", "slavic"],
    "language": ["rus", "spa", "bul"],
}

Each list is batch-aligned. If a level has only one expert, CELMoE can auto-broadcast it. If a level defines fallback_expert, that fallback can also be broadcast automatically.

Multi-loss API

One of the main reasons this package exists is the loss interface.

CELMoELossAPI lets you register three different loss scopes:

global losses over the whole CELMoEOutput
level losses for a specific LevelOutput
bridge losses between parent and child levels

Public methods:

add_global
add_level
add_bridge
compute

That means you can express setups like:

a parent regularization loss
a child specialization loss
a shared final consistency loss
a bridge loss between parent and child representations

without hardcoding any task-specific assumptions inside the framework.

Quick example

from typing import Mapping

import torch
import torch.nn as nn

from celmoe import (
    CELMoEConfig,
    CELMoELossAPI,
    ExpertBatch,
    ExpertConfig,
    ExpertModule,
    ExpertResult,
    HierarchicalCELMoE,
    HierarchyLevelConfig,
)


class LinearExpert(ExpertModule):
    def __init__(self, hidden_size: int) -> None:
        super().__init__()
        self.proj = nn.Linear(hidden_size, hidden_size)

    def forward(self, batch: ExpertBatch) -> ExpertResult:
        hidden = self.proj(batch.hidden)
        return ExpertResult(hidden=hidden)


def build_expert(level_name: str, expert_name: str, metadata: Mapping[str, object]) -> ExpertModule:
    del level_name, expert_name, metadata
    return LinearExpert(hidden_size=256)


config = CELMoEConfig(
    hidden_size=256,
    levels=[
        HierarchyLevelConfig(
            name="parent",
            experts={"core": ExpertConfig(name="core")},
            fallback_expert="core",
        ),
        HierarchyLevelConfig(
            name="child",
            experts={
                "a": ExpertConfig(name="a", stop_gradient_from_parent=True),
                "b": ExpertConfig(name="b", stop_gradient_from_parent=True),
            },
        ),
    ],
)

model = HierarchicalCELMoE(config, expert_factory=build_expert)

hidden = torch.randn(4, 32, 256)
output = model(
    hidden,
    routing={
        "parent": ["core"] * 4,
        "child": ["a", "b", "a", "b"],
    },
)

loss_api = CELMoELossAPI()
loss_api.add_global("l2", lambda out, targets, context: out.fused_hidden.pow(2).mean())
bundle = loss_api.compute(output)
print(bundle.total)

Data structures

Important public objects:

LevelOutput
CELMoEOutput
LossTerm
LossBundle

LossBundle is especially useful when you want both:

a differentiable total loss for backpropagation
a flat scalar dictionary for logs and dashboards

Gradient isolation

Per-expert gradient isolation is controlled through stop_gradient_from_parent.

When enabled, the hidden state passed to that expert is detached before the expert runs. This is useful when you want:

hierarchical specialization
reduced interference across levels
staged training or freeze/unfreeze workflows

Good use cases

celmoe-vp fits well when you have:

multilingual systems with language or family experts
domain experts on top of a shared trunk
regional or product-specific experts
curriculum setups with parent/child objectives
research code that needs explicit loss composition

What it intentionally does not include

This package does not prescribe:

embeddings
tokenization
Transformer block internals
data loading
optimizer logic
checkpointing

Those belong in adjacent packages. celmoe-vp is the orchestration layer, not the full stack.

Typing and packaging

The package is strictly typed and ships py.typed.

That matters because CELMoE is meant to be published and consumed as an independent package, not only as a local monorepo internal. Downstream packages can rely on its public dataclasses and expert contracts without importing project-specific code.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

3.0.0

Apr 2, 2026

2.0.1

Mar 30, 2026

1.1.4

Mar 30, 2026

1.1.3

Mar 30, 2026

1.1.2

Mar 30, 2026

1.1.1

Mar 30, 2026

This version

1.1.0

Mar 29, 2026

1.0.1

Mar 29, 2026

1.0.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celmoe_vp-1.1.0.tar.gz (10.1 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

celmoe_vp-1.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file celmoe_vp-1.1.0.tar.gz.

File metadata

Download URL: celmoe_vp-1.1.0.tar.gz
Upload date: Mar 29, 2026
Size: 10.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`386c58e78b1910545e730cf9dff7d41e33d4e6ab7222a4acfb44fae96a773bc4`
MD5	`376870fc660b1124bb4122d55060316e`
BLAKE2b-256	`571972c3d12485199e2ed280fed74689ba2b38dd9ba07e306e439e35855a5a40`

See more details on using hashes here.

File details

Details for the file celmoe_vp-1.1.0-py3-none-any.whl.

File metadata

Download URL: celmoe_vp-1.1.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for celmoe_vp-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38d1e7f90304d42a23f4c9dbcf5be9bac7d3d2eadb3d35a5f7b8b4e1d9109005`
MD5	`6075675e829c6dd113769a8b93903462`
BLAKE2b-256	`52e6b6707fe7484d375c832547f655b57769ccbc98fe3c09a62908996af95b12`

See more details on using hashes here.

celmoe-vp 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

celmoe-vp

Core idea

Main concepts

Configuration objects

Expert contract

Hierarchical execution

Multi-loss API

Quick example

Data structures

Gradient isolation

Good use cases

What it intentionally does not include

Typing and packaging

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes