Skip to main content

Megatron Core - a library for efficient and scalable training of transformer based models

Project description

Megatron-LM and Megatron Core

GPU-optimized library for training transformer models at scale

Documentation version license

About

This repository contains two components: Megatron-LM and Megatron Core.

Megatron-LM is a reference example that includes Megatron Core plus pre-configured training scripts. Best for research teams, learning distributed training, and quick experimentation.

Megatron Core is a composable library with GPU-optimized building blocks for custom training frameworks. It provides transformer building blocks, advanced parallelism strategies (TP, PP, DP, EP, CP), mixed precision support (FP16, BF16, FP8, FP4), and model architectures. Best for framework developers and ML engineers building custom training pipelines.

Megatron Bridge provides bidirectional Hugging Face ↔ Megatron checkpoint conversion with production-ready recipes.

Quick Start

Install Megatron Core with pip:

  1. Install Megatron Core with required dependencies:

    pip install --no-build-isolation megatron-core[mlm,dev]
    
  2. Clone repository for examples:

    git clone https://github.com/NVIDIA/Megatron-LM.git
    cd Megatron-LM
    pip install --no-build-isolation .[mlm,dev]
    

Latest News

  • [2026/01] Dynamic Context Parallelism - Up to 1.48x speedup for variable-length sequence training with adaptive CP sizing.
  • [2025/12] Megatron Core development has moved to GitHub! All development and CI now happens in the open. We welcome community contributions.
  • [2025/10] Megatron Dev Branch - early access branch with experimental features.
  • [2025/10] Megatron Bridge - Bidirectional converter for interoperability between Hugging Face and Megatron checkpoints, featuring production-ready recipes for popular models.
  • [2025/08] MoE Q3-Q4 2025 Roadmap - Comprehensive roadmap for MoE features including DeepSeek-V3, Qwen3, advanced parallelism strategies, FP8 optimizations, and Blackwell performance enhancements.
  • [2025/08] GPT-OSS Model - Advanced features including YaRN RoPE scaling, attention sinks, and custom activation functions are being integrated into Megatron Core.
  • [2025/06] Megatron MoE Model Zoo - Best practices and optimized configurations for training DeepSeek-V3, Mixtral, and Qwen3 MoE models with performance benchmarking and checkpoint conversion tools.
  • [2025/05] Megatron Core v0.11.0 brings new capabilities for multi-data center LLM training (blog).
Previous News
  • [2024/07] Megatron Core v0.7 improves scalability and training resiliency and adds support for multimodal training (blog).
  • [2024/06] Megatron Core added supports for Mamba-based models. Check out our paper An Empirical Study of Mamba-based Language Models and code example.
  • [2024/01 Announcement] NVIDIA has released the core capabilities in Megatron-LM into Megatron Core in this repository. Megatron Core expands upon Megatron-LM's GPU-optimized techniques with more cutting-edge innovations on system-level optimizations, featuring composable and modular APIs.

Project Structure

Megatron-LM/
├── megatron/
│   ├── core/                    # Megatron Core (kernels, parallelism, building blocks)
│   │   ├── models/              # Transformer models
│   │   ├── transformer/         # Transformer building blocks
│   │   ├── tensor_parallel/     # Tensor parallelism
│   │   ├── pipeline_parallel/   # Pipeline parallelism
│   │   ├── distributed/         # Distributed training (FSDP, DDP)
│   │   ├── optimizer/           # Optimizers
│   │   ├── datasets/            # Dataset loaders
│   │   ├── inference/           # Inference engines
│   │   └── export/              # Model export (e.g. TensorRT-LLM)
│   ├── training/                # Training scripts
│   ├── inference/               # Inference server
│   ├── legacy/                  # Legacy components
│   └── post_training/           # Post-training (RLHF, etc.)
├── examples/                    # Ready-to-use training examples
├── tools/                       # Utility tools
├── tests/                       # Comprehensive test suite
└── docs/                        # Documentation

Performance Benchmarking

For our latest performance benchmarking results, please refer to NVIDIA Megatron Bridge Performance Summary.

Our codebase efficiently trains models from 2B to 462B parameters across thousands of GPUs, achieving up to 47% Model FLOP Utilization (MFU) on H100 clusters.

Model table

Benchmark Configuration:

  • Vocabulary size: 131,072 tokens
  • Sequence length: 4096 tokens
  • Model scaling: Varied hidden size, attention heads, and layers to achieve target parameter counts
  • Communication optimizations: Fine-grained overlapping with DP (--overlap-grad-reduce, --overlap-param-gather), TP (--tp-comm-overlap), and PP (enabled by default)

Key Results:

  • 6144 H100 GPUs: Successfully benchmarked 462B parameter model training
  • Superlinear scaling: MFU increases from 41% to 47-48% with model size
  • End-to-end measurement: Throughputs include all operations (data loading, optimizer steps, communication, logging)
  • Production ready: Full training pipeline with checkpointing and fault tolerance
  • Note: Performance results measured without training to convergence

Weak Scaling Results

Our weak scaled results show superlinear scaling (MFU increases from 41% for the smallest model considered to 47-48% for the largest models); this is because larger GEMMs have higher arithmetic intensity and are consequently more efficient to execute.

Weak scaling

Strong Scaling Results

We also strong scaled the standard GPT-3 model (our version has slightly more than 175 billion parameters due to larger vocabulary size) from 96 H100 GPUs to 4608 GPUs, using the same batch size of 1152 sequences throughout. Communication becomes more exposed at larger scale, leading to a reduction in MFU from 47% to 42%.

Strong scaling

Roadmaps

  • MoE Roadmap - DeepSeek-V3, Qwen3, advanced parallelism, FP8 optimizations, and Blackwell enhancements

Resources

Getting Help

Contributing

We ❤️ contributions! Ways to contribute:

  • 🐛 Report bugs - Help us improve reliability
  • 💡 Suggest features - Shape the future of Megatron Core
  • 📝 Improve docs - Make Megatron Core more accessible
  • 🔧 Submit PRs - Contribute code improvements

Contributing Guide

Citation

If you use Megatron in your research or project, we appreciate that you use the following citations:

@article{megatron-lm,
  title={Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism},
  author={Shoeybi, Mohammad and Patwary, Mostofa and Puri, Raul and LeGresley, Patrick and Casper, Jared and Catanzaro, Bryan},
  journal={arXiv preprint arXiv:1909.08053},
  year={2019}
}

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megatron_core-0.16.1.tar.gz (1.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

megatron_core-0.16.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

megatron_core-0.16.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (2.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

megatron_core-0.16.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

megatron_core-0.16.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (2.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

megatron_core-0.16.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

megatron_core-0.16.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (2.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

megatron_core-0.16.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

megatron_core-0.16.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (2.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ ARM64manylinux: glibc 2.28+ ARM64

File details

Details for the file megatron_core-0.16.1.tar.gz.

File metadata

  • Download URL: megatron_core-0.16.1.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for megatron_core-0.16.1.tar.gz
Algorithm Hash digest
SHA256 86522feacfa5ef384a6ac627e7d88cf4866811aed3bc9b6c2a85cbf15cff91a3
MD5 43a75f30d914a98f19151f310b83f8d1
BLAKE2b-256 e593b8725486a5d2cda20f003d26fd8c777fe95451303bd661cedc37d0d005eb

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 de0ec63a1c413124a4214b341754335db865aaf75db59823efedc70b50d9a11c
MD5 7187802409dc94b3f1a83e97309da18f
BLAKE2b-256 e383938642610f0b5ba1ab4d4a256d5a6525eaed216efdc56e092a7b121685f6

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8f5ce1a8660f947925b8d7d070566d89163ea9ec2fd00ffd19a31ead8243ce30
MD5 9cb0f9e7e54a3dffd9c7532a27909944
BLAKE2b-256 0504d70af2e3f2fc20687a7cb6bfc70900228e8a4da187c33ce436ef9b11fa8e

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3fb26cb407aa3c4b609c08664022c08146f1648b3d6f2ef2bbe9d3b1294ace61
MD5 8ab0cf57165a3f43923b5b4d48b6ee70
BLAKE2b-256 0e9ea14599528cd2c3162ee3ba6c372266b588a422f9ef8941e7d2d35946c67f

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c96f37f8a630fed5fb7c132d8088da176bf8e6f9345c23532493af0f9ba214ba
MD5 cca87671dd78c02776f8fa839359b90b
BLAKE2b-256 e3ee2cd3db404122708f957f138fe1fa94e7f616686d49323da4e63a33d5c140

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 38f6bef63cd4ebc4c39c1eccbaf578596accba5c47d012faffa0670913aec131
MD5 0e8dd49445dee1c95e1b616634c4b2e1
BLAKE2b-256 e0fd2c3a62d09e2afca4cef6171e6f6aae3d0a704d5929690a7ee44fae65d538

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 dab8b9a16bff77f5677e5afbcdb9e4f5c7f794d34dedd2ea9902e4bcc878d417
MD5 5cf1787bf834d4acc69094c5c4b3d933
BLAKE2b-256 e965f95666a61e569f8460ecf058af05fd4af2c7692724ceaa4fe7b2c53f2040

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fac268d9f4b0153388a75f584d53796686f6327702c2b8b1c91077e60765cff3
MD5 3d70c7157dbce261f80399b1e839a379
BLAKE2b-256 1575ea59d5767588fdfab3dab4028d7115ce63c81ab42e7c10f16cd7b1978d93

See more details on using hashes here.

File details

Details for the file megatron_core-0.16.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for megatron_core-0.16.1-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b17840ea891a8b109134772f90dac080316ddf787d26ce9d4cacdeb466ccd5ce
MD5 4ec131a72459c3f151793f42000861ed
BLAKE2b-256 310736e7eaa96fe01ad412a0e468c8cff904d8d0c9b0939baf5cfd67f903e471

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page