A lightweight diagnostic and audit library for Mixture-of-Experts (MoE) models in HuggingFace Transformers
Project description
The pytest for Mixture-of-Experts models. Catch expert collapse, routing entropy collapse, and load imbalance — before they silently wreck your training run.
MoEWatch is a lightweight diagnostic and audit library for MoE models in HuggingFace Transformers. Drop it into any training loop — it instruments router modules with zero-weight-modification PyTorch hooks, aggregates routing statistics, and surfaces structured alerts the moment something goes wrong.
Features
- Expert collapse detection — tracks dead and cold experts per layer across the full training run
- Routing entropy analysis — catches distribution collapse relative to theoretical maximum entropy
- Load imbalance alerts — fires when any single expert dominates token dispatch (max/mean ratio)
- Auto-detection — recognises Mixtral, OLMoE, DeepSeek-MoE, Qwen-MoE, Phi-MoE, Switch Transformer, and more out of the box; falls back to heuristic scan for unknown architectures
- Two integration modes — one-shot
audit()for offline diagnostics, orMoEWatchfor live training-time monitoring - HuggingFace
Trainersupport — attach as aTrainerCallbackwith one line - Structured output — console (coloured ASCII), JSON (for log pipelines), or silent (results only via
AuditReport) - Configurable overhead —
sample_every=10keeps instrumentation below 2 % in production;sample_every=1for maximum fidelity during debugging - Fixed memory footprint — ring buffer with configurable capacity; no unbounded growth over long runs
Supported Architectures
Auto-detected via registry (no configuration needed):
| Family | Models |
|---|---|
| Mixtral | mistralai/Mixtral-* |
| OLMoE | allenai/OLMoE-* |
| DeepSeek-MoE | deepseek-ai/DeepSeek-V2, DeepSeek-V3 |
| Qwen-MoE | Qwen/Qwen2-MoE-*, Qwen3-MoE-* |
| Phi-MoE | microsoft/Phi-*-MoE |
| Switch Transformer | Google's HuggingFace port |
| NLLB-MoE | facebook/nllb-moe-* |
| Arctic | Snowflake/snowflake-arctic-* |
| Jamba | ai21labs/Jamba-* |
Any custom architecture can be targeted via WatchConfig(router_modules=[...]).
Installation
pip install moewatch
Requires Python ≥ 3.8, PyTorch ≥ 1.10, and Transformers (optional — required only for MoEWatch.attach(trainer)).
Quick Start
Offline audit (one-shot)
Run a diagnostic against a model and dataloader without modifying your training loop:
import moewatch
report = moewatch.audit(model, dataloader, steps=200)
print(report.summary())
Live monitoring (HuggingFace Trainer)
from moewatch import MoEWatch, WatchConfig
watcher = MoEWatch(model, config=WatchConfig())
watcher.attach(trainer) # injects as a TrainerCallback
trainer.train()
watcher.detach()
Live monitoring (custom loop)
from moewatch import MoEWatch
watcher = MoEWatch(model)
watcher.start()
for step, batch in enumerate(dataloader):
loss = model(**batch).loss
loss.backward()
optimizer.step()
alerts = watcher.step(step) # returns List[Alert]; empty when healthy
watcher.stop()
Configuration
All thresholds and options live in WatchConfig. Three presets cover most use cases:
from moewatch import WatchConfig
WatchConfig.default() # balanced — recommended starting point
WatchConfig.aggressive() # tighter thresholds, every-step sampling — for debugging
WatchConfig.lightweight() # minimal overhead — for large-scale production runs
Common overrides:
config = WatchConfig(
dead_threshold=0.001, # < 0.1 % token share → expert is DEAD
entropy_warn=0.60, # < 60 % of H_max → WARN
entropy_critical=0.40, # < 40 % of H_max → ERROR
load_imbalance_error=5.0, # max/mean > 5× → ERROR
sample_every=10, # instrument every 10th forward pass
output="json", # "console" | "json" | "silent"
)
See the Configuration reference → for all fields and their defaults.
Alert Levels
| Level | Meaning |
|---|---|
INFO |
Routine routing statistics — everything healthy |
WARN |
Degraded routing — investigate soon |
ERROR |
Severe collapse or imbalance — likely harming training |
MoEWatch never stops your training run. It diagnoses; you decide.
Output Modes
# Human-readable console output (default)
WatchConfig(output="console")
# Newline-delimited JSON — pipe to Grafana, Splunk, or a custom pipeline
WatchConfig(output="json")
# No real-time output — results available only via AuditReport
WatchConfig(output="silent")
Documentation
Contributing
Issues and pull requests are welcome. To add a new architecture to the auto-detection registry, open an issue or add the router class name(s) to _ARCHITECTURE_REGISTRY in hooks/detection.py and submit a PR.
For full contribution guidelines, see CONTRIBUTING.md.
License
Apache 2.0 — see LICENSE.
Built by Abinesh.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file moewatch-0.1.0.tar.gz.
File metadata
- Download URL: moewatch-0.1.0.tar.gz
- Upload date:
- Size: 28.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2d5fea9506516ff4f03625993b00625dff97952dcc6cecdb1b774f929887a88
|
|
| MD5 |
62cbcab61fa9d87b3ba21763d1510a41
|
|
| BLAKE2b-256 |
bfcf63b642411752e56113857cc1a75323990edc97bc9052de7ee4f7865c4dba
|
File details
Details for the file moewatch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: moewatch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 85.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
665c995424eb950d68969b5890ed6d3ea462920fe819f33c2fd7efa76a23e1ad
|
|
| MD5 |
990f935269445796d5e7334a74bc3c96
|
|
| BLAKE2b-256 |
b41aa88bbc31c4b19f927e77c2cd4f4fc5a080e57474863e3519c756513aab6c
|