Morphoformer with CELMoE-based multilingual morphology, typed training pipeline, and publishable CLI.

These details have not been verified by PyPI

Project description

morphoformer

morphoformer is the application package of the Morph_v4 stack. It combines character-level vocabularies, dataset tooling, typed training utilities, reusable Transformer blocks, and the generic CELMoE hierarchy into a trainable multilingual morphology system.

PyPI package name:

pip install morphoformer

Import name:

import morphoformer

What this package is

Unlike the libraries under libs/, morphoformer is not just a toolkit piece. It is the runnable application layer:

configuration loading
CLI commands
model wiring
trainer
inference entry points

It depends on these independently publishable packages:

chartoken-vp
celmoe-vp
sigmorphon-vp
torchblocks-vp
trainkit-vp

Architecture summary

The current model builds a three-level expert hierarchy:

universal
family
language

The actual orchestration is handled by HierarchicalCELMoE. morphoformer supplies the morphology-specific expert blocks, embeddings, routing, and output heads.

Input side:

character embeddings
feature embeddings
language embeddings
feature-to-token broadcast fusion

Expert side:

MorphExpertStack built from torchblocks-vp
configurable attention, norm, feedforward, adapter, convolution, and position modules
routing by language family and language code

Output side:

logits
universal_logits
family_logits
language_logits

Those outputs are consumed by the multi-loss training setup in trainkit-vp.

Installation

Requirements:

Python >=3.14
PyTorch >=2.0

Install from PyPI:

pip install morphoformer

For local development from this repository, publish or install the dependent libraries first, because they are versioned as separate packages.

CLI

The package exposes the morphoformer console command.

Available subcommands:

download
inspect-config
train
infer

Download data

List languages:

morphoformer download --list-languages

Download specific languages and merge them:

morphoformer download --lang rus,krl,afb --out-dir data --merge

Download everything known by the downloader:

morphoformer download --lang all --out-dir data

Inspect config

morphoformer inspect-config --config dev/config.toml

Train

morphoformer train --config dev/config.toml

The trainer writes the best checkpoint into the configured output directory.

Infer

morphoformer infer `
  --config dev/config.toml `
  --checkpoint artifacts/v4_omni/best.pt `
  --lemma write `
  --tags "V;PST" `
  --lang eng

Configuration

The TOML config is loaded into typed dataclasses:

DataConfig
LanguageConfig
ModelConfig
OptimizerConfig
TrainConfig
DecodeConfig
MorphoformerConfig

Main config sections:

[data]
[model]
[optimizer]
[train]
[decode]
[languages.<code>]

Example:

[data]
train_path = "data/merged_train.tsv"
dev_path = "data/merged_dev.tsv"
max_len = 96
max_features = 12

[model]
d_model = 768
dim_ff = 2304
num_heads = 12
num_kv_heads = 4
dropout = 0.12
max_positions = 256
feature_dim = 128
attention = "gqa"
feedforward = "swiglu"
norm = "rmsnorm"
adapter = "language_conditioned"
universal_layers = 8
family_layers = 2
language_layers = 2

[train]
stage = "joint"
epochs = 10
batch_size = 64
warmup_steps = 500
total_steps = 12000
output_dir = "artifacts/v4_omni"

[languages.rus]
family = "slavic"

Training flow

The trainer does the following:

load train and dev TSV data
build character and feature vocabularies
build the language-to-id map from config
pre-encode datasets into MorphDataset
instantiate Morphoformer
freeze or unfreeze stages according to train.stage
optimize with AdamW, warmup cosine schedule, and AMP when enabled
evaluate on the dev set each epoch
save the best checkpoint

The loss is a weighted combination of:

final output loss
universal expert loss
family expert loss
language expert loss

Checkpoint contents

Saved checkpoints include:

model_state
optimizer_state
char_vocab
feature_vocab
language_to_id
epoch

That is enough to restore the model together with the exact vocabularies used during training.

Inference path

predict_form(...):

encodes the lemma with CharVocab
encodes tags with FeatureVocab
maps the language string to language_id
runs greedy decoding through the model
decodes predicted ids back into a surface string

Relationship to `celmoe-vp`

This package is where the task-specific part begins.

celmoe-vp itself stays generic and knows nothing about morphology. morphoformer is responsible for:

choosing hierarchy levels
defining expert block structure
mapping languages to families
attaching morphology-specific heads
converting expert outputs into token logits

That split is important because the architecture package and the application package are published separately.

Publishing and versioning

In Morph_v4 the libraries are not bundled into one mega-package. Each package is published independently and morphoformer depends on versioned releases of the lower-level libs.

That means before publishing morphoformer, you should publish compatible versions of:

chartoken-vp
celmoe-vp
sigmorphon-vp
torchblocks-vp
trainkit-vp

The repository includes publish.ps1 to build, version, and publish the stack in dependency order.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

7.0.0

Apr 20, 2026

6.0.0

Apr 2, 2026

5.1.0

Mar 31, 2026

5.0.6

Mar 31, 2026

5.0.5

Mar 31, 2026

5.0.4

Mar 31, 2026

5.0.3

Mar 31, 2026

5.0.2

Mar 31, 2026

5.0.1

Mar 31, 2026

5.0.0

Mar 30, 2026

4.7.7

Mar 30, 2026

4.7.6

Mar 30, 2026

This version

4.7.5

Mar 30, 2026

4.7.4

Mar 30, 2026

4.7.3

Mar 30, 2026

4.7.2

Mar 30, 2026

4.7.1

Mar 30, 2026

4.7.0

Mar 30, 2026

4.6.0

Mar 30, 2026

4.5.0

Mar 30, 2026

4.4.0

Mar 30, 2026

4.3.0

Mar 29, 2026

4.2.0

Mar 29, 2026

4.1.0

Mar 29, 2026

4.0.1

Mar 29, 2026

4.0.0

Mar 29, 2026

3.2.0

Mar 28, 2026

3.1.0

Mar 28, 2026

3.0.1

Mar 28, 2026

3.0.0

Mar 28, 2026

2.3.1

Mar 28, 2026

2.3.0

Mar 27, 2026

2.2.1

Mar 27, 2026

2.2.0

Mar 27, 2026

2.1.4

Mar 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

morphoformer-4.7.5.tar.gz (22.3 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

morphoformer-4.7.5-py3-none-any.whl (23.2 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file morphoformer-4.7.5.tar.gz.

File metadata

Download URL: morphoformer-4.7.5.tar.gz
Upload date: Mar 30, 2026
Size: 22.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for morphoformer-4.7.5.tar.gz
Algorithm	Hash digest
SHA256	`43b32cd5757d80712b0b7a06fb0320df75a7c43acffe2911f485ad65d7009716`
MD5	`1aa4742f2c943a06bc09e928b9725a19`
BLAKE2b-256	`fafdf8b37399a35cc6bba50d6992abe36f2f162e4b9461d29dcd3a1f53298450`

See more details on using hashes here.

File details

Details for the file morphoformer-4.7.5-py3-none-any.whl.

File metadata

Download URL: morphoformer-4.7.5-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 23.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for morphoformer-4.7.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66728cfd9ab3ffab88f52b606b63a622d5265376d3e4a25bc1df63d7f74ee15f`
MD5	`31e7e9ea0518a1cd33407081461f4be6`
BLAKE2b-256	`c794cde3a9516ec8e263e2f0be9427c2f454c6c92947ec0ae16b2ddeada826ad`

See more details on using hashes here.

morphoformer 4.7.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

morphoformer

What this package is

Architecture summary

Installation

CLI

Download data

Inspect config

Train

Infer

Configuration

Training flow

Checkpoint contents

Inference path

Relationship to `celmoe-vp`

Publishing and versioning

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

morphoformer 4.7.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

morphoformer

What this package is

Architecture summary

Installation

CLI

Download data

Inspect config

Train

Infer

Configuration

Training flow

Checkpoint contents

Inference path

Relationship to celmoe-vp

Publishing and versioning

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Relationship to `celmoe-vp`