Skip to main content

Megatron Bridge: Training Recipes for Megatron-based LLM and VLM models

Project description

๐Ÿ“ฃ News

Overview

NeMo Megatron Bridge is a PyTorch-native library within the NeMo Framework that provides pretraining, SFT and LoRA for popular LLM and VLM models. It serves as a powerful bridge, conversion, and verification layer between ๐Ÿค— Hugging Face and Megatron Core. It provides bidirectional checkpoint conversion between these formats, enabling other projects to leverage Megatron Core's parallelism capabilities or export models for various inference engines. The bridge includes built-in verification mechanisms to ensure conversion accuracy and checkpoint integrity across different model formats.

On top of the bridge, NeMo Megatron Bridge provides a performant and scalable PyTorch-native training loop that leverages Megatron Core to deliver state-of-the-art training throughput. It supports pretraining and fine-tuning with features like tensor and pipeline parallelism, and mixed precision (FP8, BF16, FP4, etc.). Users can either use existing ๐Ÿค— Hugging Face models or define custom PyTorch model definitions for flexible end-to-end workflows.

NeMo Megatron Bridge is a refactor of the previous NeMo training stack that adopts a PyTorch-native training loop to provide greater flexibility and customizability for developers.

image

๐Ÿ”ง Installation

๐Ÿณ NeMo Framework container

The best experience, highest performance, and full feature support are provided by the NeMo Framework container. Fetch the most recent $TAG and run the following to start a container:

docker run --rm -it -w /workdir -v $(pwd):/workdir \
  --entrypoint bash \
  --gpus all \
  nvcr.io/nvidia/nemo:${TAG}

For development installation and additional details, please refer to our Contribution guide.

โšก Quickstart

To get started, install Megatron Bridge or download a NeMo Framework container as described above.

Log in to Hugging Face Hub:

huggingface-cli login --token <your token>

Conversion-only quickstart (โœ… Core):

from megatron.bridge import AutoBridge

# 1) Create a bridge from a Hugging Face model (hub or local path)
bridge = AutoBridge.from_hf_pretrained("meta-llama/Llama-3.2-1B", trust_remote_code=True)

# 2) Get a Megatron provider and configure parallelism before instantiation
provider = bridge.to_megatron_provider()
provider.tensor_model_parallel_size = 1
provider.pipeline_model_parallel_size = 1
provider.finalize()
# 3) Materialize Megatron Core model(s)
model = provider.provide_distributed_model(wrap_with_ddp=False)

# 4a) Export Megatron โ†’ Hugging Face (full HF folder with config/tokenizer/weights)
bridge.save_hf_pretrained(model, "./hf_exports/llama32_1b")

# 4b) Or stream only weights (Megatron โ†’ HF)
for name, weight in bridge.export_hf_weights(model, cpu=True):
    print(name, tuple(weight.shape))

Training quickstart using pre-configured recipes:

from megatron.bridge.recipes.llama import llama32_1b_pretrain_config
from megatron.bridge.training.gpt_step import forward_step
from megatron.bridge.training.pretrain import pretrain

if __name__ == "__main__":
    # The recipe uses the Llama 3.2 1B model configuration from HuggingFace
    cfg = llama32_1b_pretrain_config(seq_length=1024)

    # Override training parameters
    cfg.train.train_iters = 10
    cfg.scheduler.lr_decay_iters = 10000
    cfg.model.vocab_size = 8192
    cfg.tokenizer.vocab_size = cfg.model.vocab_size

    pretrain(cfg, forward_step)

You can launch the above script with:

torchrun --nproc-per-node=<num devices> /path/to/script.py

More examples:

For a deeper dive into conversion design and advanced usage, see the models README.

๐Ÿš€ Key Features

  • Bridge with ๐Ÿค— Hugging Face: Seamless bidirectional conversion between ๐Ÿค— Hugging Face and Megatron formats for interoperability (model bridges, auto bridge, conversion examples)
    • Online import/export without intermediate full checkpoints
    • Parallelism-aware (TP/PP/VPP/CP/EP/ETP) during conversion
    • Memory-efficient per-parameter streaming
    • Simple high-level AutoBridge API with architecture auto-detection
    • Optimized paths when Transformer Engine is available
  • Flexible to Customize: Lightweight custom training loop making it easy to configure custom logic in data loading, distributed training, checkpointing, evaluation and logging (training framework, training utilities)
  • Supervised & Parameter-Efficient Finetuning: SFT & PEFT implementation tailored for Megatron-based models that supports LoRA, DoRA, and user-defined PEFT methods (PEFT implementations, finetune module, SFT dataset)
  • SOTA Training Recipes: Pre-configured production-ready training recipes for popular models like Llama 3, with optimized hyperparameters and distributed training configuration (Llama recipes, recipe examples)
  • Performance Optimization: Built-in support for FP8 training, model parallelism, and memory-efficient techniques to offer high utilization and near-linear scalability to thousands of nodes. (mixed precision, communication overlap, optimizer utilities)

Supported Models

Megatron Bridge provides out-of-the-box bridges and training recipes for a wide range of models, built on top of base model architectures from Megatron Core. Refer to the models directory for the most up-to-date list of model bridges.

Supported Models Overview

For more details on supported models, see our documentation:

Model Checkpoint Conversion Pretrain Recipes SFT & LoRA Recipes
DeepSeek V2 โœ… โœ… (v2) Coming soon
DeepSeek V2 Lite โœ… โœ… (v2-lite) Coming soon
DeepSeek V3 โœ… โœ… (v3) Coming soon
Gemma โœ… Coming soon Coming soon
Gemma 2 โœ… Coming soon Coming soon
Gemma 3 โœ… โœ… (1B) โœ… (1B)
Gemma 3-VL โœ… Coming soon โœ… (4B/12B/27B)
GLM-4.5 โœ… โœ… (106B-Air/355B) โœ… (106B-Air/355B)
GPT-oss โœ… โœ… (20B/120B) โœ… (20B/120B)
Llama 2 โœ… โœ… (7B) Coming soon
Llama 3 โœ… โœ… (8B/70B) โœ… (8B/70B)
Llama 3.1 โœ… โœ… (8B/70B/405B) โœ… (8B/70B/405B)
Llama 3.2 โœ… โœ… (1B/3B) โœ… (1B/3B)
Llama 3.3 โœ… Coming soon Coming soon
Llama Nemotron โœ… Coming soon Coming soon
Mistral โœ… Coming soon Coming soon
Ministral โœ… โœ… 3B/8B/14B โœ… 3B/8B/14B
Moonlight โœ… โœ… (16B) โœ… (16B)
Nemotron โœ… Coming soon Coming soon
Nemotron-3 โœ… โœ… (A3B) โœ… (A3B)
Nemotron-H โœ… โœ… (4B/8B/47B/56B) Coming soon
Nemotron Nano v2 โœ… โœ… (9B/12B) Coming soon
Nemotron Nano v2 VL โœ… Coming soon โœ… (9B/12B)
OlMoE โœ… โœ… (7B) โœ… (7B)
Qwen2 โœ… โœ… (500M/1.5B/7B/72B) โœ… (500M/1.5B/7B/72B)
Qwen2.5 โœ… โœ… (500M/1.5B/7B/14B/32B/72B) โœ… (500M/1.5B/7B/14B/32B/72B)
Qwen2.5-VL โœ… Coming soon โœ… (3B/7B/32B/72B)
Qwen3 โœ… โœ… (600M/1.7B/4B/8B/14B/32B) โœ… (600M/1.7B/4B/8B/14B/32B)
Qwen3-MoE โœ… โœ… (A3B/A22B) โœ… (A3B/A22B)
Qwen3 Next โœ… โœ… (80B-A3B) โœ… (80B-A3B)
Qwen3-VL โœ… Coming soon โœ… (8B/A3B-A30B-MoE)

Launching Recipes

For a conceptual overview of how recipes are structured, overridden, and launched with either torchrun or NeMo-Run, read the Using Recipes guide.

Runnable tutorials live in tutorials/recipes/llama that covers:

  • 00_quickstart_pretrain.py for mock-data pretraining
  • 01_quickstart_finetune.py + LoRA configs
  • YAML-driven flows and launch helpers

Performance Benchmarks

For detailed performance benchmarks including throughput metrics across different GPU systems (DGX-GB200, DGX-B200, DGX-H100) and model configurations, see the Performance Summary in our documentation.

Project Structure

Megatron-Bridge/
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ models/                  # Bridge usage examples
โ”‚   โ””โ”€โ”€ recipes/                 # Training examples
โ”œโ”€โ”€ src/megatron/bridge/
โ”‚   โ”œโ”€โ”€ data/                    # Dataloaders and iterators
โ”‚   โ”œโ”€โ”€ models/                  # Hugging Face bridge infrastructure and model-specific implementations
โ”‚   โ”‚   โ”œโ”€โ”€ llama/               # Llama model providers
โ”‚   โ”‚   โ””โ”€โ”€ .../                 # Other models (gpt, t5, etc.)
โ”‚   โ”œโ”€โ”€ peft/                    # PEFT transformations and wrappers
โ”‚   โ”œโ”€โ”€ recipes/                 # Complete training recipes
โ”‚   โ”œโ”€โ”€ training/                # Training loop components
โ”‚   โ”‚   โ”œโ”€โ”€ tokenizers/          # Tokenizer library
โ”‚   โ”‚   โ””โ”€โ”€ utils/               # Training-specific utilities
โ”‚   โ””โ”€โ”€ utils/                   # Generic utilities for repo-wide usage
โ””โ”€โ”€ tests/                       # Comprehensive test suite

Acknowledgement & Contributing

Megatron-Bridge is the continuation of MBridge by Yan Bai. We appreciate all the contribution and adoptions by the community partners:

  • Mind Lab successfully used Megatron-bridge and VeRL to trained GRPO Lora for Trillion-parameter model on 64 H800 - See their techblog.
  • VeRL has adopted Megatron-Bridge as a connector to Megatron-Core and for LoRA support.
  • Slime has adopted Megatron-Bridge as Megatron-Core checkpoint converter.
  • SkyRL has adopted Megatron-Bridge as Megatron-Core connector.
  • Nemo-RL has adopted Megatron-Bridge as Megatron-Core connector.
  • Community contributions: Special thanks to Guanyou He and Junyu Wu from Weixin Group Infrastructure Center.

Please see our Contributor Guidelines for more information on how to get involved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megatron_bridge-0.3.1.tar.gz (610.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

megatron_bridge-0.3.1-py3-none-any.whl (825.6 kB view details)

Uploaded Python 3

File details

Details for the file megatron_bridge-0.3.1.tar.gz.

File metadata

  • Download URL: megatron_bridge-0.3.1.tar.gz
  • Upload date:
  • Size: 610.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for megatron_bridge-0.3.1.tar.gz
Algorithm Hash digest
SHA256 3d5b7400fcabd6e49ae2dfabebd6db2a6a5de3bb6c2b2f96c02e2c825301a763
MD5 719e8a54a214ac7a3f128cf377c4fb48
BLAKE2b-256 c3cc9332fe885379763568a691d778b5427453106366ab0027b69050d1020bfb

See more details on using hashes here.

File details

Details for the file megatron_bridge-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for megatron_bridge-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4d9a9ad679bd927a84f923e5239512d72af7fa47331b749a8f275592c2920616
MD5 f1157a1656a03827373465405a2472e9
BLAKE2b-256 ffa627b97395cbb5a12c68a95a2eedd95541894245a28dea54900b7c54122577

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page