DTensor-native pretraining and fine-tuning for LLMs/VLMs with day-0 Hugging Face support, GPU-acceleration, and memory efficiency.

These details have not been verified by PyPI

Project links

Project description

🚀 NeMo AutoModel

📖 Documentation • 🔥 Ready-to-Use Recipes • 💡 Examples • 🤝 Contributing

NeMo Framework is NVIDIA's GPU accelerated, end-to-end training framework for large language models (LLMs), multi-modal models and speech models. It enables seamless scaling of training (both pretraining and post-training) workloads from single GPU to thousand-node clusters for both 🤗Hugging Face/PyTorch and Megatron models. It includes a suite of libraries and recipe collections to help users train models from end to end. The AutoModel library ("NeMo AutoModel") provides GPU-accelerated PyTorch training for 🤗Hugging Face models on Day-0. Users can start training and fine-tuning models instantly without conversion delays, scale effortlessly with PyTorch-native parallelisms, optimized custom kernels, and memory-efficient recipes-all while preserving the original checkpoint format for seamless use across the Hugging Face ecosystem.

⚠️ Note: NeMo AutoModel is under active development. New features, improvements, and documentation updates are released regularly. We are working toward a stable release, so expect the interface to solidify over time. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out.

Feature Roadmap

✅ Available now | 🔜 Coming in 25.09

✅ HuggingFace Integration - Works with 1-70B models (Qwen, Llama).
✅ Distributed Training - Fully Sharded Data Parallel (FSDP2) support.
✅ Environment Support - Support for SLURM and interactive training.
✅ Learning Algorithms - SFT (Supervised Fine-Tuning), and PEFT (Parameter Efficient Fine-Tuning).
✅ Large Model Support - Native PyTorch support for models up to 70B parameters.
✅ Advanced Parallelism - PyTorch native FSDP2, TP, CP, and SP for efficient training.
✅ Sequence Packing - Sequence packing in both DTensor and MCore for huge training perf gains.
✅ DCP - Distributed Checkpoint support with SafeTensors output.
✅ HSDP - Hybrid Sharding Data Parallelism based on FSDP2.
🔜 Pipeline Support - Torch-native support for pipelining composable with FSDP2 and DTensor (3D Parallelism).
🔜 Pre-training - Support for model pre-training, including DeepSeekV3, GPT-OSS and Qwen3 (Coder-480B-A35B, etc).
🔜 Knowledge Distillation - Support for knowledge distillation with LLMs; VLM support will be added post 25.09.

🎛️ Supported Models

NeMo AutoModel provides native support for a wide range of models available on the Hugging Face Hub, enabling efficient fine-tuning for various domains. Below is a comprehensive list of all supported models with their available recipes:

📋 Ready-to-Use Recipes

To get started quickly, NeMo AutoModel provides a collection of ready-to-use recipes for common LLM and VLM fine-tuning tasks. Simply select the recipe that matches your model and training setup (e.g., single-GPU, multi-GPU, or multi-node).

Domain	Model Family	Model ID	Recipes
LLM	LLaMA	`meta-llama/Llama-3.2-1B`	SFT, PEFT
		`meta-llama/Llama-3.2-3B-Instruct`	SFT, PEFT
		`meta-llama/Llama-3.1-8B`	FP8
LLM	Mistral	`mistralai/Mistral-7B-v0.1`	SFT, PEFT, FP8
		`mistralai/Mistral-Nemo-Base-2407`	SFT, PEFT, FP8
		`mistralai/Mixtral-8x7B-Instruct-v0.1`	PEFT
LLM	Qwen	`Qwen/Qwen2.5-7B`	SFT, PEFT, FP8
		`Qwen/Qwen3-0.6B`	SFT, PEFT
		`Qwen/QwQ-32B`	SFT, PEFT
LLM	Gemma	`google/gemma-3-270m`	SFT, PEFT
		`google/gemma-2-9b-it`	SFT, PEFT, FP8
		`google/gemma-7b`	SFT, PEFT
LLM	Phi	`microsoft/phi-2`	SFT, PEFT
		`microsoft/Phi-3-mini-4k-instruct`	SFT, PEFT
		`microsoft/phi-4`	SFT, PEFT, FP8
LLM	Seed	`ByteDance-Seed/Seed-Coder-8B-Instruct`	SFT, PEFT, FP8
		`ByteDance-Seed/Seed-OSS-36B-Instruct`	SFT, PEFT
LLM	Baichuan	`baichuan-inc/Baichuan2-7B-Chat`	SFT, PEFT, FP8
VLM	Gemma	`google/gemma-3-4b-it`	SFT, PEFT
		`google/gemma-3n-e4b-it`	SFT, PEFT

And more: Check out more LLM and VLM examples! Any causal LM on Hugging Face Hub can be used with the base recipe template!

Run a Recipe

To run a NeMo AutoModel recipe, you need a recipe script (e.g., LLM, VLM) and a YAML config file (e.g., LLM, VLM):

# Command invocation format:
uv run <recipe_script_path> --config <yaml_config_path>

# LLM example: multi-GPU with FSDP2
uv run torchrun --nproc-per-node=8 recipes/llm_finetune/finetune.py --config recipes/llm_finetune/llama/llama3_2_1b_hellaswag.yaml

# VLM example: single GPU fine-tuning (Gemma-3-VL) with LoRA
uv run recipes/vlm_finetune/finetune.py --config recipes/vlm_finetune/gemma3/gemma3_vl_3b_cord_v2_peft.yaml

🚀 Key Features

Day-0 Hugging Face Support: Instantly fine-tune any model from the Hugging Face Hub
Lightning Fast Performance: Custom CUDA kernels and memory optimizations deliver 2–5× speedups
Large-Scale Distributed Training: Built-in FSDP2 and Megatron-FSDP for seamless multi-node scaling
Vision-Language Model Ready: Native support for VLMs (Qwen2-VL, Gemma-3-VL, etc)
Advanced PEFT Methods: LoRA and extensible PEFT system out of the box
Seamless HF Ecosystem: Fine-tuned models work perfectly with Transformers pipeline, VLM, etc.
Robust Infrastructure: Distributed checkpointing with integrated logging and monitoring
Optimized Recipes: Pre-built configurations for common models and datasets
Flexible Configuration: YAML-based configuration system for reproducible experiments
FP8 Precision: Native FP8 training & inference for higher throughput and lower memory use
INT4 / INT8 Quantization: Turn-key quantization workflows for ultra-compact, low-memory training

✨ Install NeMo AutoModel

NeMo AutoModel is offered both as a standard Python package installable via pip and as a ready-to-run NeMo Framework Docker container.

Prerequisites

# We use `uv` for package management and environment isolation.
pip3 install uv

# If you cannot install at the system level, you can install for your user with
# pip3 install --user uv

Run every command with uv run. It auto-installs the virtual environment from the lock file and keeps it up to date, so you never need to activate a venv manually. Example: uv run recipes/llm_finetune/finetune.py. If you prefer to install NeMo Automodel explicitly, please follow the instructions below.

📦 Install from a Wheel Package

# Install the latest stable release from PyPI
# We first need to initialize the virtual environment using uv
uv venv

uv pip install nemo_automodel   # or: uv pip install --upgrade nemo_automodel

🔧 Install from Source

# Install the latest NeMo Automodel from the GitHub repo (best for development).
# We first need to initialize the virtual environment using uv
uv venv

# We can now install from source
uv pip install git+https://github.com/NVIDIA-NeMo/Automodel.git

Verify the Installation

uv run python -c "import nemo_automodel; print('✅ NeMo AutoModel ready')"

📋 YAML Configuration Examples

1. Distributed Training Configuration

distributed:
  _target_: nemo_automodel.distributed.megatron_fsdp.MegatronFSDPManager
  dp_size: 8
  tp_size: 1
  cp_size: 1

2. LoRA Configuration

peft:
  peft_fn: nemo_automodel._peft.lora.apply_lora_to_linear_modules
  match_all_linear: True
  dim: 8
  alpha: 32
  use_triton: True

3. Vision-Language Model Fine-Tuning

model:
  _target_: nemo_automodel._transformers.NeMoAutoModelForImageTextToText.from_pretrained
  pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct

processor:
  _target_: transformers.AutoProcessor.from_pretrained
  pretrained_model_name_or_path: Qwen/Qwen2.5-VL-3B-Instruct
  min_pixels: 200704
  max_pixels: 1003520

4. Checkpointing and Resume

checkpoint:
  enabled: true
  checkpoint_dir: ./checkpoints
  save_consolidated: true      # HF-compatible safetensors
  model_save_format: safetensors

🗂️ Project Structure

NeMo-Automodel/
├── nemo_automodel/              # Core library
│   ├── _peft/                   # PEFT implementations (LoRA)
│   ├── _transformers/           # HF model integrations  
│   ├── checkpoint/              # Distributed checkpointing
│   ├── datasets/                # Dataset loaders
│   │   ├── llm/                 # LLM datasets (HellaSwag, SQuAD, etc.)
│   │   └── vlm/                 # VLM datasets (CORD-v2, rdr etc.)
│   ├── distributed/             # FSDP2, Megatron FSDP, parallelization
│   ├── loss/                    # Optimized loss functions
│   └── training/                # Training recipes and utilities
├── recipes/                     # Ready-to-use training recipes
│   ├── llm/                     # LLM fine-tuning recipes
│   └── vlm/                     # VLM fine-tuning recipes  
└── tests/                       # Comprehensive test suite

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

NVIDIA NeMo AutoModel is licensed under the Apache License 2.0.

🔗 Links

Documentation: https://docs.nvidia.com/nemo-framework/user-guide/latest/automodel/index.html
Hugging Face Hub: https://huggingface.co/models
Issues: https://github.com/NVIDIA-NeMo/Automodel/issues
Discussions: https://github.com/NVIDIA-NeMo/Automodel/discussions

Made with ❤️ by NVIDIA

Accelerating AI for everyone

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 28, 2026

0.3.0

Mar 2, 2026

0.2.0

Dec 4, 2025

This version

0.1.2

Oct 23, 2025

0.1.0

Oct 8, 2025

0.1.0rc0 pre-release

Sep 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nemo_automodel-0.1.2.tar.gz (264.1 kB view details)

Uploaded Oct 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nemo_automodel-0.1.2-py3-none-any.whl (342.9 kB view details)

Uploaded Oct 23, 2025 Python 3

File details

Details for the file nemo_automodel-0.1.2.tar.gz.

File metadata

Download URL: nemo_automodel-0.1.2.tar.gz
Upload date: Oct 23, 2025
Size: 264.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for nemo_automodel-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`5ca0bd0d2f32046fc509a348c7378e6f9b5228a85c52a414b68d14037ee5758c`
MD5	`f3f00684a5560b0e2245c335398c06e2`
BLAKE2b-256	`ca92b39a80cb520ba31bad8141302618eee04c968417d9cbc5efbdcd33e33b0b`

See more details on using hashes here.

File details

Details for the file nemo_automodel-0.1.2-py3-none-any.whl.

File metadata

Download URL: nemo_automodel-0.1.2-py3-none-any.whl
Upload date: Oct 23, 2025
Size: 342.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for nemo_automodel-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`808ea2aeb53c11201940b0caddbf8144c2916194e7bf239469b50bdde5ee8248`
MD5	`c2e88f938973423e69cfd73aa2e464ee`
BLAKE2b-256	`e79f698b6b4538d63103e2c74937f4bb9b2b7fd6b2ced07b5dbf01a41f9a02a9`

See more details on using hashes here.

nemo-automodel 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 NeMo AutoModel

Feature Roadmap

🎛️ Supported Models

📋 Ready-to-Use Recipes

Run a Recipe

🚀 Key Features

✨ Install NeMo AutoModel

Prerequisites

📦 Install from a Wheel Package

🔧 Install from Source

Verify the Installation

📋 YAML Configuration Examples

1. Distributed Training Configuration

2. LoRA Configuration

3. Vision-Language Model Fine-Tuning

4. Checkpointing and Resume

🗂️ Project Structure

🤝 Contributing

📄 License

🔗 Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes