Skip to main content

MorphFormer: multilingual morphological reinflection via character-level Transformer

Project description

MorphFormer v3

Character-level Transformer for multilingual morphological reinflection.

Installation

pip install morphoformer

Requires Python >= 3.14 and PyTorch >= 2.0.

Dependencies (chartoken, torchblocks, sigmorphon, trainkit) are installed automatically.

Quick Start

# Download data
morphoformer download --lang rus,deu,fra --merge

# Train
morphoformer train --preset medium --data "data/collections/*_train.tsv" --device cuda

# Infer
morphoformer infer --checkpoint checkpoints/morphformer_epoch50.pt --word "laufen" --morph "V;IND;PST;3;SG" --lang deu

# Interactive REPL
morphoformer serve --checkpoint checkpoints/morphformer_epoch50.pt

Presets

Preset d_model Encoder Decoder ~Params VRAM
small 384 4 layers 3 layers ~7M < 4 GB
medium 512 8 layers 6 layers ~45M 4-8 GB
large 768 10 layers 8 layers ~120M >= 8 GB

CLI Commands

Command Description
train Train model from TSV data
infer Single-word inference
serve Interactive REPL
download Download SigMorphon datasets
modules List registered NN modules
init-config Generate TOML config template

Data Format

TSV with columns: lemma\tfeatures\tsurface_form\tlanguage

laufen	V;IND;PST;3;SG	lief	deu

Python API

import torch
from chartoken import CharVocab, FeatureVocab
from morphoformer.model import MorphFormer
from morphoformer.inference import greedy_decode

checkpoint = torch.load("checkpoints/morphformer_epoch50.pt", map_location="cpu", weights_only=False)
char_vocab = CharVocab.from_dict(checkpoint["char_vocab"])
feature_vocab = FeatureVocab.from_dict(checkpoint["feature_vocab"])
lang_to_id = checkpoint["lang_to_id"]

# ... build model, load state_dict, call greedy_decode()

Architecture

  • Encoder-Decoder Transformer at character level
  • Grouped Query Attention (GQA) with KV cache
  • RoPE positional embeddings
  • SwiGLU feed-forward networks
  • Language-conditioned adapters
  • Conformer-style local convolution in encoder
  • Structured morphological feature encoding

Supported Devices

Device Flag
Auto-detect --device auto
NVIDIA GPU --device cuda
AMD GPU --device rocm
Intel Arc --device xpu
Apple Silicon --device mps
CPU --device cpu

License

See LICENSE file in the repository root.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

morphoformer-3.0.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

morphoformer-3.0.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file morphoformer-3.0.0.tar.gz.

File metadata

  • Download URL: morphoformer-3.0.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for morphoformer-3.0.0.tar.gz
Algorithm Hash digest
SHA256 c958456ea7cd030bf42fde5261e02c571b9ea961cd7788a1fcf16c077990b46d
MD5 8c27b052e865212d7399f3bfaf2f0464
BLAKE2b-256 958ad9c84e9ba81bea973347950125438f33899c6547279a94e178010d87dd0f

See more details on using hashes here.

File details

Details for the file morphoformer-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: morphoformer-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for morphoformer-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0dda340d4bd1ab18178aeedc312f64b2706ff44fbcda1b410aedfd910f1f103c
MD5 2ff2194f77cc1f330c67cf8b07acff13
BLAKE2b-256 73f090597f6dd1b65746ef6e60e2ac9e8a4ee52785d2b181ef9928c12a6aeee0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page