Un-fairseq: UnFormers (Universal Transformers) — config-driven enc-dec chassis covering NLLB/mBART/Marian/mT5/UL2/t5gemma/TranslateGemma/Qwen/Gemma, plus Matryoshka encoder, Garg 2019 supervised attention, PyTorch IBM Models 1/2/HMM/4, Brown+k-means clustering, and portable char/byte alignment.

These details have not been verified by PyPI

Project links

Project description

unfairseq / UnFormers

UnFormers (Universal Transformers) is a single configurable encoder-decoder Transformer implementation that covers the architectural choices of modern NMT / seq2seq model families through presets. One codebase, one set of modules, one HF-compatible PreTrainedModel — and a preset picks the knobs (attention kind, positional encoding, norm, FFN, bias policy, …) to reconstruct NLLB / mBART / Marian / mT5 / UL2 / t5gemma / TranslateGemma / Qwen / Gemma.

On top of the core it ships:

Matryoshka encoder (MatFormer-style depth pruning): train once, serve at multiple depths, prune permanently after training.
Supervised attention word alignment (Garg 2019) on a configurable decoder-layer / head, applied at every Matryoshka granularity.
Neural IBM alignment (IBM Model 1 / 2 / HMM) in pure PyTorch, GPU- batched, subword-native — an eflomal replacement that aligns directly on your tokenizer's ids.
Portable alignment format: char-span (or UTF-8 byte-span) records plus word-level aggregation via ICU, so alignments are usable by any downstream tokenizer.
UL2 mixture-of-denoisers corpus preprocessing (R / X / S denoisers).
Expert-parallel MoE, KV cache for generation, gradient checkpointing, and warm_start (Net2Net + bert2bert) to seed UnFormer weights from any HF checkpoint.

Installation

UnFormers has one native dependency chain you need to handle before pip install: PyICU, which wraps ICU4C (the Unicode library Chrome/Firefox/ Java all use). ICU ships the word-break dictionaries for CJK / Thai / Khmer / Lao / Myanmar that make word-level alignment work for those languages.

1. Install ICU4C (system library)

macOS (Homebrew):

brew install icu4c
# Homebrew doesn't symlink icu4c by default; tell pkg-config where to find it:
echo 'export PATH="/usr/local/opt/icu4c/bin:/usr/local/opt/icu4c/sbin:$PATH"' >> ~/.zshrc
echo 'export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig"' >> ~/.zshrc

Apple Silicon paths are under /opt/homebrew/opt/icu4c/... instead of /usr/local/opt/....

Debian / Ubuntu:

sudo apt install pkg-config libicu-dev

Fedora / RHEL:

sudo dnf install libicu-devel

Alpine:

apk add icu-dev pkgconfig

Windows: grab the ICU binaries from https://icu.unicode.org/download and ensure icu-config is on PATH, or use a pre-built PyICU wheel from the Python wheels index (2.16+ has Windows wheels).

2. Install PyICU (Python binding)

# after the system icu4c is in place:
pip install PyICU>=2.11

If PyICU's build fails with "u_init_74 not found" or similar, you have a version mismatch — icu-config --version must match the ICU the wheel was built against. Rebuild against your local ICU with:

PYICU_INCLUDES="$(icu-config --cppflags)" \
PYICU_LFLAGS="$(icu-config --ldflags)" \
pip install --no-binary=:all: PyICU

3. ICU data / dictionaries

ICU's word-break dictionaries for zh / ja / th / km / lo / my ship with the ICU4C install — you do not need to download anything separately. To verify the bundled dictionaries are available:

import icu
bi = icu.BreakIterator.createWordInstance(icu.Locale("zh"))
bi.setText("我爱北京天安门")
print([bi.current(), bi.next()])  # should return actual boundary offsets

If icu.ICU_VERSION prints and BreakIterator segments Chinese correctly, you have the dictionaries. They live inside icudt{VERSION}l.dat in the ICU data directory (icu-config --icudatadir). On a minimal ICU install ("lite") the dict files are stripped; install the full ICU package (default on every major distro).

If you ever need a newer or language-specific ICU data bundle, download icu4c-*-data-bin-l.zip from https://icu.unicode.org/download and drop the .dat file into icu-config --icudatadir.

4. UnFormers itself

pip install -e .                 # dev install from a checkout
# or from the repo root:
pip install .                    # regular install
pip install .[align]             # + eflomal (optional, we ship our own)
pip install .[dev]               # + pytest, ruff

Once installed, sanity-check ICU integration:

python -c "import icu; print('ICU', icu.ICU_VERSION, 'PyICU', icu.__version__)"
python -c "from unformers.align import get_segmenter; print(get_segmenter('zh')('机器翻译系统'))"

Quick start

Build a model from a preset with any HF tokenizer

from transformers import AutoTokenizer
from unformers import UnFormerForConditionalGeneration
from unformers.presets import from_preset

tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
cfg = from_preset("ul2-mini-6-3", vocab_size=tok.vocab_size,
                  pad_token_id=tok.pad_token_id,
                  bos_token_id=tok.bos_token_id or tok.eos_token_id,
                  eos_token_id=tok.eos_token_id)
model = UnFormerForConditionalGeneration(cfg)

Train IBM-2 alignments and emit portable JSONL

python -m unformers.align.cli \
    --input parallel.tsv --src-col 0 --tgt-col 1 \
    --src-lang eng_Latn --tgt-lang zho_Hans \
    --tokenizer Qwen/Qwen2.5-0.5B \
    --aligner-epochs 5 \
    --output aligned.jsonl

Each output line is tokenizer-agnostic:

{
  "src_text": "hello world",
  "tgt_text": "你好 世界",
  "src_lang": "eng_Latn",
  "tgt_lang": "zho_Hans",
  "char_alignments": [{"src": [0, 5], "tgt": [0, 2]}, {"src": [6, 11], "tgt": [3, 5]}],
  "word_alignments": [{"src": [0, 5], "tgt": [0, 2]}, {"src": [6, 11], "tgt": [3, 5]}],
  "byte_offsets": false,
  "segmenter_src": "icu:eng_Latn",
  "segmenter_tgt": "icu:zho_Hans"
}

Use --byte for UTF-8 byte offsets instead of char offsets.

Train UL2-mini with Matryoshka + supervised attention

python examples/train_pure_pytorch.py \
    --tokenizer Qwen/Qwen2.5-0.5B \
    --n-pairs 5000 --max-steps 1000 --batch-size 16 \
    --d-model 256 --num-heads 8 --ffn-size 512

Warm-start from an HF checkpoint

from transformers import AutoModelForSeq2SeqLM
from unformers.interop import warm_start

source = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-base")
cfg = from_preset("mt5", vocab_size=source.config.vocab_size, size="mt5-base")
target = UnFormerForConditionalGeneration(cfg)
manifest = warm_start(target, source, strategy="auto")
print(manifest.summary())   # copied=..., padded=..., randomised=...

Preset capability matrix

Preset	Positional	Norm	FFN	Attention	Bias	Align default	Notes
`marian`	sinusoidal	LayerNorm post	ReLU	MHA	✓	opt-in	classic vanilla Transformer
`nllb`	sinusoidal	LayerNorm pre	ReLU (MoE opt)	MHA	✓	opt-in	lang codes, MoE via `moe_num_experts`
`mbart`	learned abs	LayerNorm pre	GELU	MHA	✓	opt-in
`mt5`	T5 rel-bias	RMSNorm pre	GeGLU	MHA	✗	opt-in	untied lm_head
`ul2`	T5 rel-bias	RMSNorm pre	SwiGLU (MoE opt)	MHA	✗	opt-in	prefix-LM, `[R]`/`[X]`/`[S]` tags
`t5gemma`	RoPE	RMSNorm pre	GeGLU	GQA	✗	opt-in	tied embed, √d scale
`translategemma`	RoPE 1M base	RMSNorm+preresid	GeGLU	GQA, QK-norm, logit-cap, sliding	✗	opt-in	5:1 local/global interleave
`qwen3.5`	RoPE 1M base	RMSNorm pre	SwiGLU	GQA, QK-norm	✗	opt-in	decoder-only family → enc-dec adapted
`gemma4`	RoPE multi-freq	RMSNorm+preresid	GeGLU	GQA, QK-norm, logit-cap, sliding	✗	opt-in	local 10k / global 1M RoPE bases
`ul2-mini-6-3`	RoPE	RMSNorm pre	SwiGLU	MHA	✗	on	Matryoshka [2,4,6], Garg 2019 demo

All presets accept **kwargs to override d_model / encoder_layers / decoder_layers / num_heads / intermediate_size etc. so you can shrink a 2B preset into a test-sized version:

cfg = from_preset("gemma4", vocab_size=32000, d_model=64,
                  encoder_layers=2, decoder_layers=2,
                  num_heads=4, num_kv_heads=2, head_dim=16, intermediate_size=128)

Alignment supervision is available on every preset

Every preset exposes the same set of alignment_* kwargs to from_preset. Garg 2019 supervised cross-attention is off by default for all presets except ul2-mini-6-3 (the demo preset), where it's on. Enable and tune on any preset:

cfg = from_preset(
    "nllb",
    vocab_size=tok.vocab_size, size="nllb-600m-distilled",
    alignment_enabled=True,                     # turn Garg loss on
    alignment_loss_weight=0.05,                 # λ in total = ce + λ * align
    alignment_decoder_layer=-1,                 # which decoder layer to supervise (-1 = top)
    alignment_num_heads=1,                      # first N cross-attn heads, averaged
    alignment_full_context=False,               # second decoder pass w/o causal mask
    alignment_apply_to_all_granularities=True,  # Matryoshka × alignment
)

To disable on ul2-mini-6-3: pass alignment_enabled=False. For full control pass alignment=AlignmentConfig(...) as a kwarg — the explicit config overrides any individual alignment_* kwargs.

What's in the box

Architecture (config-driven)

Attention: MHA / MQA / GQA, QK-norm, attention logit soft-cap, sliding window, per-layer local/global interleave.
Positional: sinusoidal, learned abs, T5 bucketed rel-bias, ALiBi, RoPE (single-freq + per-layer multi-freq with NTK / linear scaling).
Norm: LayerNorm (bias / no-bias), RMSNorm; pre- / post-norm; pre-residual norm (Gemma-style).
FFN: Dense (GELU/ReLU/SiLU), GLU (SwiGLU/GeGLU/ReGLU), MoE (single-GPU + expert-parallel).
Embedding: tied / untied; √d_model scale; final-logit soft-cap & scale.
Decoder: causal or prefix-LM; every-N cross-attention layers.

Training

UnFormerTrainer subclasses HF Seq2SeqTrainer; use it or fall back to examples/train_pure_pytorch.py when you don't want accelerate.
Losses: label-smoothed CE, Garg 2019 alignment NLL, Switch-style MoE aux.
Matryoshka depth sampling: joint / stochastic / sandwich.
Gradient checkpointing via model.gradient_checkpointing_enable() — skips the alignment-supervised layer so the Garg loss still backprops.

Alignment

unformers.align.NeuralIBMAligner — IBM Model 1 / 2 / HMM, factored lexical table, GPU-batched, pharaoh output, fwd/rev + grow-diag-final-and symmetrisation.
unformers.align.PortableAlignment — char (default) or byte spans + word aggregation; python -m unformers.align.cli end-to-end runner.

Data

TokenizerWrapper — any HF tokenizer, handles UL2 denoiser tags and lang codes.
UL2 mixture-of-denoisers (R / X / S) preprocessing.
Seq2SeqWithAlignmentCollator — pads src/tgt, shifts decoder input, turns pharaoh alignments into flat loss-index tensors with inverse-frequency weights.

Interop

warm_start(target, source, strategy="auto") — Net2Net (wider / deeper identity insertion) + bert2bert (cross-attn init from self-attn when source lacks cross-attn) + key-normalisation aliases for T5 / BART / NLLB / Marian / mBART / Llama / Qwen / Gemma naming. Returns a CopyManifest listing copied / padded / identity-inserted / randomised tensors.

Generation

model.generate(...) via HF GenerationMixin. KV cache verified against full-forward parity to 2e-5. Greedy and beam search both work.

Development

Run the tests

pip install -e '.[dev]'
pytest                           # fast tests
pytest -v -m slow -k 0.5B        # large-scale param-tier tests
pytest tests/test_portable_alignment.py -v  # alignment + ICU tests

Layout

unformers/
  config.py          # UnFormerConfig + all nested dataclasses
  modules/           # attention, positional, norm, ffn, moe, embedding
  blocks/            # encoder_layer, decoder_layer
  model/             # encoder, decoder, seq2seq (PreTrainedModel)
  presets/           # one file per family + _helpers.py
  align/             # NeuralIBMAligner, portable alignment, segmenters, CLI
  data/              # tokenizer wrapper, collator, UL2 denoisers
  train/             # trainer, losses, Matryoshka policy
  interop/           # warm_start
examples/
  smoke_test.py             # HF Trainer path
  train_pure_pytorch.py     # plain torch loop
tests/
  test_presets.py
  test_preset_sizes.py      # 0.5B / 1B / 2B / 3B tiers (slow)
  test_warm_start.py
  test_gradient_checkpointing.py
  test_moe.py
  test_portable_alignment.py

License

See LICENSE in the repo root.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.0.5

Jul 11, 2019

0.6.0.4

Jul 11, 2019

0.6.0.3

Dec 3, 2018

This version

0.0.1

Apr 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unfairseq-0.0.1.tar.gz (79.7 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

unfairseq-0.0.1-py3-none-any.whl (86.6 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file unfairseq-0.0.1.tar.gz.

File metadata

Download URL: unfairseq-0.0.1.tar.gz
Upload date: Apr 18, 2026
Size: 79.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for unfairseq-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`8578135ad207fa9e3c417c7e927a2f4fe45ba91d23a3fa500c916a2127c692cf`
MD5	`c6c20a7489522208824ca90b2c09fce8`
BLAKE2b-256	`983aac37a15405c8fa290bc313f4596a3d740ca61789a5e5ac9a151114bb6463`

See more details on using hashes here.

File details

Details for the file unfairseq-0.0.1-py3-none-any.whl.

File metadata

Download URL: unfairseq-0.0.1-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 86.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for unfairseq-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d101092438030f6b38abb4f7a57148fa6c1711c664aa0e1aeb155886dbfa697e`
MD5	`2807fdea083ae55adbcba9c74962b599`
BLAKE2b-256	`9b6e5c6fc08df923e3434032836feffeafc0d7a7815290d0ec43ccb6d24a09f8`

See more details on using hashes here.

unfairseq 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

unfairseq / UnFormers

Installation

1. Install ICU4C (system library)

2. Install PyICU (Python binding)

3. ICU data / dictionaries

4. UnFormers itself

Quick start

Build a model from a preset with any HF tokenizer

Train IBM-2 alignments and emit portable JSONL

Train UL2-mini with Matryoshka + supervised attention

Warm-start from an HF checkpoint

Preset capability matrix

Alignment supervision is available on every preset

What's in the box

Architecture (config-driven)

Training

Alignment

Data

Interop

Generation

Development

Run the tests

Layout

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes