Morphoformer with CELMoE-based multilingual morphology, typed training pipeline, and publishable CLI.
Project description
morphoformer
morphoformer is the application package of the Morph_v4 stack. It combines character-level vocabularies, dataset tooling, typed training utilities, reusable Transformer blocks, and the generic CELMoE hierarchy into a trainable multilingual morphology system.
PyPI package name:
pip install morphoformer
Import name:
import morphoformer
What this package is
Unlike the libraries under libs/, morphoformer is not just a toolkit piece. It is the runnable application layer:
- configuration loading
- CLI commands
- model wiring
- trainer
- inference entry points
It depends on these independently publishable packages:
chartoken-vpcelmoe-vpsigmorphon-vptorchblocks-vptrainkit-vp
Architecture summary
The current model builds a three-level expert hierarchy:
universalfamilylanguage
The actual orchestration is handled by HierarchicalCELMoE. morphoformer supplies the morphology-specific expert blocks, embeddings, routing, and output heads.
Input side:
- character embeddings
- feature embeddings
- language embeddings
- feature-to-token broadcast fusion
Expert side:
MorphExpertStackbuilt fromtorchblocks-vp- configurable attention, norm, feedforward, adapter, convolution, and position modules
- routing by language family and language code
Output side:
logitsuniversal_logitsfamily_logitslanguage_logits
Those outputs are consumed by the multi-loss training setup in trainkit-vp.
Installation
Requirements:
- Python
>=3.14 - PyTorch
>=2.0
Install from PyPI:
pip install morphoformer
For local development from this repository, publish or install the dependent libraries first, because they are versioned as separate packages.
CLI
The package exposes the morphoformer console command.
Available subcommands:
downloadinspect-configtraininfer
Download data
List languages:
morphoformer download --list-languages
Download specific languages and merge them:
morphoformer download --lang rus,krl,afb --out-dir data --merge
Download everything known by the downloader:
morphoformer download --lang all --out-dir data
Inspect config
morphoformer inspect-config --config dev/config.toml
Initialize config
morphoformer init-config --path dev/config.toml
Overwrite existing file:
morphoformer init-config --path dev/config.toml --force
Train
morphoformer train --config dev/config.toml
The trainer writes the best checkpoint into the configured output directory.
Infer
morphoformer infer `
--config dev/config.toml `
--checkpoint artifacts/v4_omni/best.pt `
--lemma write `
--tags "V;PST" `
--lang eng
Configuration
The TOML config is loaded into typed dataclasses:
DataConfigLanguageConfigModelConfigOptimizerConfigTrainConfigDecodeConfigMorphoformerConfig
Main config sections:
[data][model][optimizer][train][decode][languages.<code>]
Example:
[data]
train_path = "data/merged_train.tsv"
dev_path = "data/merged_dev.tsv"
max_len = 96
max_features = 12
[model]
d_model = 768
dim_ff = 2304
num_heads = 12
num_kv_heads = 4
dropout = 0.12
max_positions = 256
feature_dim = 128
attention = "gqa"
feedforward = "swiglu"
norm = "rmsnorm"
adapter = "language_conditioned"
universal_layers = 8
family_layers = 2
language_layers = 2
[train]
stage = "joint"
epochs = 10
batch_size = 64
warmup_steps = 500
total_steps = 12000
output_dir = "artifacts/v4_omni"
[languages.rus]
family = "slavic"
Training flow
The trainer does the following:
- load train and dev TSV data
- build character and feature vocabularies
- build the language-to-id map from config
- pre-encode datasets into
MorphDataset - instantiate
Morphoformer - freeze or unfreeze stages according to
train.stage - optimize with
AdamW, warmup cosine schedule, and AMP when enabled - evaluate on the dev set each epoch
- save the best checkpoint
The loss is a weighted combination of:
- final output loss
- universal expert loss
- family expert loss
- language expert loss
Checkpoint contents
Saved checkpoints include:
model_stateoptimizer_statechar_vocabfeature_vocablanguage_to_idepoch
That is enough to restore the model together with the exact vocabularies used during training.
Inference path
predict_form(...):
- encodes the lemma with
CharVocab - encodes tags with
FeatureVocab - maps the language string to
language_id - runs greedy decoding through the model
- decodes predicted ids back into a surface string
Hierarchical model and loss (advanced)
By default the model uses the classic three levels (universal → family → language) with one global stack depth per segment (universal_layers, family_layers, language_layers). You can go further:
Per-layer block overrides (torchblocks)
Each expert can carry a list of LayerBlockPartial entries (attention, feedforward, norm, adapter, conv, dropout, head dims, etc.). They are stored on ExpertDefinition.layer_overrides and serialized through celmoe metadata into MorphExpertStack, which builds one MorphExpertBlock per layer via resolve_block_config (morphoformer.model.block_config).
Declarative hierarchy
MorphoformerHierarchySpec(morphoformer.config.schema) listsHierarchyLevelDefentries: level name,expertsmap (ExpertDefinition:num_layers,layer_overrides,stop_gradient_from_parent), optionalfallback_expert, androuting:auto|constant|family|language(autoinfers from the level name when possible).
TOML (рекомендуется): задайте дерево под [model] с массивом таблиц [[model.hierarchy.levels]]. После каждого такого блока таблицы [model.hierarchy.levels.experts.<expert_id>] относятся к последнему объявленному уровню (так устроен TOML). Либо задайте experts списком вложенных таблиц [[model.hierarchy.levels.experts]] с полями name, num_layers, опционально layer_overrides (массив inline-таблиц с полями из LayerBlockPartial). См. dev/config.toml.
[model.hierarchy.expert_pools]— именованные списки id экспертов (произвольные ключи).expert_poolна уровне — имя пула изexpert_pools;expert_ids— тот же список, но inline в уровне.experts_from: только"languages"или"families"— подставить id из[languages.*](имена языков или уникальныеfamily). Любое другое строковое значение вexperts_from(если не заданexpert_pool) считается именем пула (без отдельного ключаexpert_pool).default_num_layers/default_stop_gradient_from_parentна уровне — база для шаблона до слияния сexpert_template; иначе применяется эвристика по имени уровня иModelConfig.universal_layers/family_layers/language_layers.expert_template(частичныйExpertDefinition,num_layers = 0наследует дефолт уровня); точечные переопределения —[model.hierarchy.levels.experts.<id>].
JSON: model.hierarchy_path — путь относительно файла конфига; корень JSON — объект с ключом levels (как в hierarchy_spec_from_dict).
По умолчанию: если model.hierarchy в TOML не задан и hierarchy_path пуст, иерархия строится из [languages.*] и universal_layers / family_layers / language_layers.
Если заданы и hierarchy_path, и встроенный model.hierarchy, приоритет у встроенного TOML.
Loss composition
Training uses FlexibleHierarchyLoss (morphoformer.training.hierarchy_loss) with LossCompositionConfig:
[[train.loss.components]]— если задан хотя бы один элемент, сумма total строится только из перечисленныхkey×weight(промежуточные ключи:final,level/<level_name>,bridge/<bridge_name>). Тогдаfinal_weight,level_weightsиweightу[[train.loss.bridges]]не участвуют в сумме (мосты всё равно считаются по[[train.loss.bridges]]; вклад в total задаётся строкой сkey = "bridge/..."вcomponents).final_weight— fused head CE (еслиcomponentsпуст).level_weights— map level name → weight (еслиcomponentsпуст).aliases— короткое отображаемое имя → внутренний ключ в логах (universal,level_family,level_language, …).groups— дополнительные слагаемые к total:members— ключи промежуточных лоссов (final,level/<имя_уровня>,bridge/<имя>),combine:mean|sum,weight.bridges— согласованность между уровнями:name,parent_level,child_level,weight(игнорируется при непустомcomponents),metric:mse_logits|kl_logits|l2_hidden(дляl2_hiddenмодель должна отдаватьlevel_hiddensв выходе).
TOML: секция [train.loss], вложенные [train.loss.level_weights], [train.loss.aliases], опционально [[train.loss.components]], [[train.loss.groups]], [[train.loss.bridges]]. Пример — dev/config.toml.
Legacy final_loss_weight, universal_loss_weight, … remain supported; if train.loss is omitted, they are folded into a LossCompositionConfig via from_legacy_train_weights.
Freezing stages
For non-classic hierarchies, trainkit.freeze_stages trains only the named level when stage matches a model.level_names entry; joint enables all levels. The original three-level behavior is unchanged when levels are exactly universal / family / language.
Relationship to celmoe-vp
This package is where the task-specific part begins.
celmoe-vp itself stays generic and knows nothing about morphology. morphoformer is responsible for:
- choosing hierarchy levels
- defining expert block structure
- mapping languages to families
- attaching morphology-specific heads
- converting expert outputs into token logits
That split is important because the architecture package and the application package are published separately.
Publishing and versioning
In Morph_v4 the libraries are not bundled into one mega-package. Each package is published independently and morphoformer depends on versioned releases of the lower-level libs.
That means before publishing morphoformer, you should publish compatible versions of:
chartoken-vpcelmoe-vpsigmorphon-vptorchblocks-vptrainkit-vp
The repository includes publish.ps1 to build, version, and publish the stack in dependency order.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file morphoformer-5.0.4.tar.gz.
File metadata
- Download URL: morphoformer-5.0.4.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e135dcff8d484da5662136c60ee206fd2803296f806e19bbd63368c7f50db68a
|
|
| MD5 |
d6ea0408f410b21ba7f095ddf9ece0ef
|
|
| BLAKE2b-256 |
da10c4bd33a874a3e142889c92218b464facb8f8f8d4109dade8eb98874bd0b6
|
File details
Details for the file morphoformer-5.0.4-py3-none-any.whl.
File metadata
- Download URL: morphoformer-5.0.4-py3-none-any.whl
- Upload date:
- Size: 39.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f352069f2c034ab00276f13a7ffd23bcb470004f0e9225d5ff4ba1b4562ed292
|
|
| MD5 |
9eddcc52e14f04104dbf10d25db6e5cc
|
|
| BLAKE2b-256 |
ec90d6f8ade98ffccaed44204707003110bad780ef11bdb9e03c0019da3dcf9d
|