Skip to main content

Parallel Diffusion Language Model with 60 features — MoE, LoRA, attention variants, and more

Project description

OMGFormer

Parallel Diffusion Language Model — 60 features, production-ready

PyPI License Python PyTorch

OMGFormer is a modular PyTorch library for building and training parallel masked diffusion language models. It ships with a comprehensive set of attention variants, MoE routing strategies, LoRA fine-tuning methods, and training utilities.


Installation

pip install omgformer
# with safetensors support (recommended for LoRA saving)
pip install omgformer[safetensors]

Quick Start

from omgformer import OMGConfig, OMGModel

config = OMGConfig(
    vocab_size=32000,
    hidden_size=768,
    num_layers=12,
    num_heads=12,
)
model = OMGModel(config)

Feature Highlights

Attention (Features #1–#16)

  • Grouped Query Attention (GQA) — reduced KV heads for efficient inference
  • Multi-Head Latent Attention (MLA) — DeepSeek-style latent compression
  • Sliding Window Attention — Mistral-style local context windows
  • Linear Attention — O(n) complexity via kernel feature maps
  • Block Sparse Attention — memory-efficient sparse patterns
  • RoPE variants: standard, YaRN, NTK-aware, LongRoPE
  • ALiBi and T5 relative position biases

Layers (Features #17–#28)

  • SwiGLU, GeGLU, ReGLU FFN variants
  • RMSNorm and ScaleNorm
  • AdaLN modulation for diffusion timestep conditioning
  • Token Merging — dynamic sequence length reduction
  • Stochastic Depth — drop-path regularization

Mixture of Experts (Features #37–#40)

  • Standard top-K token-choice routing
  • Expert Choice routing (Google Switch-style) — perfect load balance
  • Soft MoE (Google Brain 2023) — fully differentiable routing
  • Shared expert (DeepSeek MoE) — always-on dense + sparse experts

LoRA / PEFT (Features #41–#44)

  • Standard LoRA with configurable rank and alpha
  • DoRA — weight-decomposed LoRA (Liu et al., 2024)
  • rsLoRA — rank-stabilized scaling for high-rank training
  • LoRA+ — different learning rates for A and B matrices
  • Save/load only adapter weights (~MB, not GB)

Training (Features #45–#52)

  • Lion optimizer (Chen et al., 2023)
  • EMA with warmup annealing
  • Warmup + cosine LR schedule with optional restarts
  • Gradient checkpointing helper
  • FSDP wrapping helper

Advanced (Features #53–#60)

  • KV Cache for autoregressive decoding
  • Multi-Token Prediction Head (MTP)
  • Model merging: SLERP, DARE, TIES
  • Reward model head + PPO step
  • INT8 / INT4 quantization stubs
  • GGUF export stub
  • RAG context injector
  • Dynamic batching engine
  • Chunked long-document attention

LoRA Fine-tuning Example

from omgformer import OMGConfig, OMGModel, add_lora, merge_lora, save_lora

model = OMGModel(OMGConfig())

# Add LoRA adapters (freezes base weights automatically)
model = add_lora(model, rank=16, alpha=32)

# ... fine-tune ...

# Merge and save
model = merge_lora(model)
save_lora(model, "./my_adapter")

MoE Example

from omgformer import OMGConfig, OMGModel

config = OMGConfig(
    use_moe=True,
    num_experts=8,
    num_experts_per_token=2,
    moe_expert_choice=False,   # or True for Expert Choice routing
)
model = OMGModel(config)
output, aux_loss = model(input_ids)
loss = ce_loss + aux_loss  # add load-balancing loss during training

License

Apache 2.0 — see LICENSE.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omgformer-2.0.5.tar.gz (70.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omgformer-2.0.5-py3-none-any.whl (73.8 kB view details)

Uploaded Python 3

File details

Details for the file omgformer-2.0.5.tar.gz.

File metadata

  • Download URL: omgformer-2.0.5.tar.gz
  • Upload date:
  • Size: 70.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for omgformer-2.0.5.tar.gz
Algorithm Hash digest
SHA256 429ccba0eb76a6b68aa671e0146e41ae3874e80c511b0b87b29059a2c7af80a1
MD5 53485c7aa34cbbf0b9f906c629eea726
BLAKE2b-256 7fdf93ce240aedaa313d8ffc3fd61138aff9bf7eec0cec1b26c6f300937e6508

See more details on using hashes here.

File details

Details for the file omgformer-2.0.5-py3-none-any.whl.

File metadata

  • Download URL: omgformer-2.0.5-py3-none-any.whl
  • Upload date:
  • Size: 73.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for omgformer-2.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 eeae0cdbb431d6433103634e6e69cd947820445550a2db24a61912d1b55bdba0
MD5 cdd4d3381a4d235a192bb0ca5e802f41
BLAKE2b-256 943298a08f21d1f5679e9af74b3a846588632a73c2222b0fd45c6a370ca23115

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page