Parallel Diffusion Language Model with 60 features — MoE, LoRA, attention variants, and more
Project description
OMGFormer
Parallel Diffusion Language Model — 60 features, production-ready
OMGFormer is a modular PyTorch library for building and training parallel masked diffusion language models. It ships with a comprehensive set of attention variants, MoE routing strategies, LoRA fine-tuning methods, and training utilities.
Installation
pip install omgformer
# with safetensors support (recommended for LoRA saving)
pip install omgformer[safetensors]
Quick Start
from omgformer import OMGConfig, OMGModel
config = OMGConfig(
vocab_size=32000,
hidden_size=768,
num_layers=12,
num_heads=12,
)
model = OMGModel(config)
Feature Highlights
Attention (Features #1–#16)
- Grouped Query Attention (GQA) — reduced KV heads for efficient inference
- Multi-Head Latent Attention (MLA) — DeepSeek-style latent compression
- Sliding Window Attention — Mistral-style local context windows
- Linear Attention — O(n) complexity via kernel feature maps
- Block Sparse Attention — memory-efficient sparse patterns
- RoPE variants: standard, YaRN, NTK-aware, LongRoPE
- ALiBi and T5 relative position biases
Layers (Features #17–#28)
- SwiGLU, GeGLU, ReGLU FFN variants
- RMSNorm and ScaleNorm
- AdaLN modulation for diffusion timestep conditioning
- Token Merging — dynamic sequence length reduction
- Stochastic Depth — drop-path regularization
Mixture of Experts (Features #37–#40)
- Standard top-K token-choice routing
- Expert Choice routing (Google Switch-style) — perfect load balance
- Soft MoE (Google Brain 2023) — fully differentiable routing
- Shared expert (DeepSeek MoE) — always-on dense + sparse experts
LoRA / PEFT (Features #41–#44)
- Standard LoRA with configurable rank and alpha
- DoRA — weight-decomposed LoRA (Liu et al., 2024)
- rsLoRA — rank-stabilized scaling for high-rank training
- LoRA+ — different learning rates for A and B matrices
- Save/load only adapter weights (~MB, not GB)
Training (Features #45–#52)
- Lion optimizer (Chen et al., 2023)
- EMA with warmup annealing
- Warmup + cosine LR schedule with optional restarts
- Gradient checkpointing helper
- FSDP wrapping helper
Advanced (Features #53–#60)
- KV Cache for autoregressive decoding
- Multi-Token Prediction Head (MTP)
- Model merging: SLERP, DARE, TIES
- Reward model head + PPO step
- INT8 / INT4 quantization stubs
- GGUF export stub
- RAG context injector
- Dynamic batching engine
- Chunked long-document attention
LoRA Fine-tuning Example
from omgformer import OMGConfig, OMGModel, add_lora, merge_lora, save_lora
model = OMGModel(OMGConfig())
# Add LoRA adapters (freezes base weights automatically)
model = add_lora(model, rank=16, alpha=32)
# ... fine-tune ...
# Merge and save
model = merge_lora(model)
save_lora(model, "./my_adapter")
MoE Example
from omgformer import OMGConfig, OMGModel
config = OMGConfig(
use_moe=True,
num_experts=8,
num_experts_per_token=2,
moe_expert_choice=False, # or True for Expert Choice routing
)
model = OMGModel(config)
output, aux_loss = model(input_ids)
loss = ce_loss + aux_loss # add load-balancing loss during training
License
Apache 2.0 — see LICENSE.
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omgformer-2.0.5.tar.gz.
File metadata
- Download URL: omgformer-2.0.5.tar.gz
- Upload date:
- Size: 70.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
429ccba0eb76a6b68aa671e0146e41ae3874e80c511b0b87b29059a2c7af80a1
|
|
| MD5 |
53485c7aa34cbbf0b9f906c629eea726
|
|
| BLAKE2b-256 |
7fdf93ce240aedaa313d8ffc3fd61138aff9bf7eec0cec1b26c6f300937e6508
|
File details
Details for the file omgformer-2.0.5-py3-none-any.whl.
File metadata
- Download URL: omgformer-2.0.5-py3-none-any.whl
- Upload date:
- Size: 73.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eeae0cdbb431d6433103634e6e69cd947820445550a2db24a61912d1b55bdba0
|
|
| MD5 |
cdd4d3381a4d235a192bb0ca5e802f41
|
|
| BLAKE2b-256 |
943298a08f21d1f5679e9af74b3a846588632a73c2222b0fd45c6a370ca23115
|