MorphFormer: multilingual morphological reinflection via character-level Transformer
Project description
MorphFormer v3
Character-level Transformer for multilingual morphological reinflection.
Installation
pip install morphoformer
Requires Python >= 3.14 and PyTorch >= 2.0.
Dependencies (chartoken, torchblocks, sigmorphon, trainkit) are installed automatically.
Quick Start
# Download data
morphoformer download --lang rus,deu,fra --merge
# Train
morphoformer train --preset medium --data "data/collections/*_train.tsv" --device cuda
# Infer
morphoformer infer --checkpoint checkpoints/morphformer_epoch50.pt --word "laufen" --morph "V;IND;PST;3;SG" --lang deu
# Interactive REPL
morphoformer serve --checkpoint checkpoints/morphformer_epoch50.pt
Presets
| Preset | d_model | Encoder | Decoder | ~Params | VRAM |
|---|---|---|---|---|---|
| small | 384 | 4 layers | 3 layers | ~7M | < 4 GB |
| medium | 512 | 8 layers | 6 layers | ~45M | 4-8 GB |
| large | 768 | 10 layers | 8 layers | ~120M | >= 8 GB |
CLI Commands
| Command | Description |
|---|---|
train |
Train model from TSV data |
infer |
Single-word inference |
serve |
Interactive REPL |
download |
Download SigMorphon datasets |
modules |
List registered NN modules |
init-config |
Generate TOML config template |
Data Format
TSV with columns: lemma\tfeatures\tsurface_form\tlanguage
laufen V;IND;PST;3;SG lief deu
Python API
import torch
from chartoken import CharVocab, FeatureVocab
from morphoformer.model import MorphFormer
from morphoformer.inference import greedy_decode
checkpoint = torch.load("checkpoints/morphformer_epoch50.pt", map_location="cpu", weights_only=False)
char_vocab = CharVocab.from_dict(checkpoint["char_vocab"])
feature_vocab = FeatureVocab.from_dict(checkpoint["feature_vocab"])
lang_to_id = checkpoint["lang_to_id"]
# ... build model, load state_dict, call greedy_decode()
Architecture
- Encoder-Decoder Transformer at character level
- Grouped Query Attention (GQA) with KV cache
- RoPE positional embeddings
- SwiGLU feed-forward networks
- Language-conditioned adapters
- Conformer-style local convolution in encoder
- Structured morphological feature encoding
Supported Devices
| Device | Flag |
|---|---|
| Auto-detect | --device auto |
| NVIDIA GPU | --device cuda |
| AMD GPU | --device rocm |
| Intel Arc | --device xpu |
| Apple Silicon | --device mps |
| CPU | --device cpu |
License
See LICENSE file in the repository root.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file morphoformer-3.0.0.tar.gz.
File metadata
- Download URL: morphoformer-3.0.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c958456ea7cd030bf42fde5261e02c571b9ea961cd7788a1fcf16c077990b46d
|
|
| MD5 |
8c27b052e865212d7399f3bfaf2f0464
|
|
| BLAKE2b-256 |
958ad9c84e9ba81bea973347950125438f33899c6547279a94e178010d87dd0f
|
File details
Details for the file morphoformer-3.0.0-py3-none-any.whl.
File metadata
- Download URL: morphoformer-3.0.0-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dda340d4bd1ab18178aeedc312f64b2706ff44fbcda1b410aedfd910f1f103c
|
|
| MD5 |
2ff2194f77cc1f330c67cf8b07acff13
|
|
| BLAKE2b-256 |
73f090597f6dd1b65746ef6e60e2ac9e8a4ee52785d2b181ef9928c12a6aeee0
|