veomni

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

These details have not been verified by PyPI

Project links

Project description

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

News

[2025/04/03]🔥We release VeOmni.

Overview

VeOmni is a versatile single-modal/multi-modal pre-training/post-training framework. With VeOmni, users can easily scale any modality model to any accelerator, making it flexible and user-friendly.

Our guiding principles when building VeOmni are:

Flexibility and Modularity: The framework is designed to be modular, most piece within the framework can be decoupled and allowing users to replace with their own implementation.
No trainer: Deprecate structured Trainer classes like PyTorch-Lightning or HuggingFace Trainer instead, keep training scripts linear to expose the complete training logic to users
Omni model native: Users can scale any omni model easily.
Torch native: We wish to implement each part of the framework by making full use of the native functions of torch as much as possible.

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
News
Overview
Table of Contents
Key Features
- Upcoming Features
Getting Started
Training Examples
Supported Models
Performance
Acknowledgement
Citation
Awesome work using VeOmni
Contribution Guide
About ByteDance Seed Team

Key Features

Parallelism
- Parallel state by DeviceMesh
- Torch FSDP1/2
- Experts parallelism(Experimental)
- Easy to add new parallelism plan
- Sequence parallelism
  - Ulysess
  - Async ulysses
- Activation offloading
- Activation checkpointing
Kernels
- GroupGemm ops for moe
- Liger-Kernel integrations
Model
- Any transformers models.
- Multi-modal
  - Qwen2VL
  - Seed-Omni
  - ...
Data IO
- Dynamic batching strategy
- Omnidata processing
Distributed Checkpointing
- ByteCheckpoint(Recommend)
- Torch Distributed checkpointing
- Dcp merge tools
Other tools
- Profiling tools
- Easy yaml configuration and argument parsing

Upcoming Features

veScale FSDP
Torch native parallelism
torch.compile
Flux: Fine-grained Computation-communication Overlapping GPU Kernel integrations
Better offloading strategy
More models support
Torch native pipeline parallelism

Getting Started

Read the VeOmni Best Practice for more details.

Installation

pip3 install -e .

Install veScale(Not available yet)

git clone https://github.com/volcengine/veScale.git
pip3 install .

Quick Start

Start training like this:

bash train.sh $TRAIN_SCRIPT $CONFIG.yaml

You can override arguments in yaml by passing arguments from an external command line

bash train.sh $TRAIN_SCRIPT $CONFIG.yaml \
    --model.model_path PATH/TO/MODEL \
    --data.train_path PATH/TO/DATA \
    --train.global_batch_size GLOBAL_BATCH_SIZE \

Here is an end-to-end workflow for preparing a subset of the fineweb dataset, continuing training a qwen2_5 model with sequence parallel 2 for 20 steps, and then merging the global_step_10 distributed checkpoint to hf weight by ByteCheckpoint.

Download fineweb dataset

python3 scripts/download_hf_data.py \
  --repo_id HuggingFaceFW/fineweb \
  --local_dir ./fineweb/ \
  --allow_patterns sample/10BT/*

Download qwen2_5 model

python3 scripts/download_hf_model.py \
  --repo_id Qwen/Qwen2.5-7B \
  --local_dir .

Start training

bash train.sh tasks/train_torch.py configs/pretrain/qwen2_5.yaml \
    --model.model_path ./Qwen2.5-7B \
    --data.train_path ./fineweb/sample/10BT/ \
    --train.global_batch_size 512 \
    --train.lr 5e-7 \
    --train.ulysses_parallel_size 2 \
    --train.save_steps 10 \
    --train.max_steps 20 \
    --train.output_dir Qwen2.5-7B_CT

merge checkpoints

python3 scripts/mereg_dcp_to_hf.py \
    --load-dir Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10 \
    --model_assets_dir Qwen2.5-7B-Instruct_CT/model_assets \
    --save-dir Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10/hf_ckpt

test inference

python3 tasks/infer.py \
  --infer.model_path Qwen2.5-7B-Instruct_CT/checkpoints/global_step_10/hf_ckpt

Merge checkpoints

we use ByteCheckpoint to save checkpoints in torch.distributed.checkpoint(dcp) format. you can merge dcp file by this command:

python3 scripts/mereg_dcp_to_hf.py \
    --load-dir PATH/TO/CHECKPOINTS \
    --model_assets_dir PATH/TO/MODEL_ASSETS \
    --save-dir PATH/TO/SAVE_HF_WEIGHT \

for example, your output_dir is like this seed_omni, you want to merge global_step_100 checkpoint to hf weight.

python3 scripts/mereg_dcp_to_hf.py \
    --load-dir seed_omni/checkpoints/global_step_100 \
    --model_assets_dir seed_omni/model_assets \
    --save-dir seed_omni/hf_ckpt \

Build Docker

cd docker/
docker compose up -d
docker compose exec VeOmni bash

Training Examples

PyTorch FSDP2 Qwen2VL

bash train.sh tasks/multimodal/omni/train_qwen2_vl.py configs/multimodal/qwen2_vl/qwen2_vl.yaml

PyTorch FSDP2 Qwen2 CT

bash train.sh tasks/train_torch.py  configs/pretrain/qwen2_5.yaml

PyTorch FSDP2 llama3-8b-instruct CT

bash train.sh  tasks/train_torch.py configs/pretrain/llama3.yaml

Supported Models

Model	Model size	Example config File
DeepSeek 2.5/3/R1	236B/671B	deepseek.yaml
Llama 3-3.3	1B/3B/8B/70B	llama3.yaml
Qwen 2-2.5	0.5B/1.5B/3B/7B/14B/32B/72B/	qwen2_5.yaml
Qwen2-VL/Qwen2.5-VL/QVQ	2B/3B/7B/32B/72B	qwen2_vl.yaml
Seed_omni	any foundation model with any omni encoder&&decoder	seed_omni.yaml

VeOmni Support all transformers models if you don't need sequence parallelism or experts parallelism or other parallelism and cuda kernal optimize in VeOmni. We design a model registry mechanism. When the model is registered in veomni, we will automatically load the model and optimizer in VeOmni. Otherwise, it will default to load the modeling file in transformers.

If you want to add a new model, you can add a new model in the model registry. See in Support costom model docs.

Performance

Coming soon with tech report.

Acknowledgement

Thanks to the following projects for their excellent work:

Citation

If you find VeOmni useful for your research and applications, feel free to give us a star ⭐ or cite us using:

@software{VeOmni,
      title={VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework},
      author={Qianli Ma, Yaowei Zheng, Zhelun Shi, Zhongkai Zhao, Bin jia, Ziyue Huang, Zhi Zhang},
      year={2025},
      howpublished={GitHub repository},
      publisher={ByteDance Seed},
      url={https://github.com/ByteDance-Seed/VeOmni},
}

Awesome work using VeOmni

UI-TARS

Contribution Guide

Contributions from the community are welcome! Please check out CONTRIBUTING.md our project roadmap(To be updated),

About ByteDance Seed Team

seed logo

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

You can get to know us better through the following channels👇

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.11

May 26, 2026

0.1.10

May 21, 2026

0.1.9a5 pre-release

May 7, 2026

0.1.9a4 pre-release

May 5, 2026

0.1.9a2 pre-release

Apr 27, 2026

0.1.9a1 pre-release

Apr 15, 2026

0.1.4

Dec 9, 2025

0.1.0

Sep 19, 2025

0.0.3

Apr 2, 2025

0.0.2

Apr 2, 2025

This version

0.0.1

Apr 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

veomni-0.0.1.tar.gz (200.6 kB view details)

Uploaded Apr 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

veomni-0.0.1-py3-none-any.whl (267.9 kB view details)

Uploaded Apr 2, 2025 Python 3

File details

Details for the file veomni-0.0.1.tar.gz.

File metadata

Download URL: veomni-0.0.1.tar.gz
Upload date: Apr 2, 2025
Size: 200.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for veomni-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`813e85e68dff62c90b5c8f6f9b52d624699621d50a1f95590e1e6da1601b026c`
MD5	`995050e985662f3e04ac18b5297bf2f4`
BLAKE2b-256	`09c61a8be257959dc518da74eee438a30fc2b2a588fa9f7f3b64d56b89824d34`

See more details on using hashes here.

File details

Details for the file veomni-0.0.1-py3-none-any.whl.

File metadata

Download URL: veomni-0.0.1-py3-none-any.whl
Upload date: Apr 2, 2025
Size: 267.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.4

File hashes

Hashes for veomni-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a032d7588b919aa566dcc070a0285c4fb4e748def0dbce7cc3324e9b1adcf65b`
MD5	`53eb19b185810d67488cc72f7f1ac661`
BLAKE2b-256	`a3b816f2a5c64771bc7344a5364b635ec50e766bc7293896258db319f059dfa1`

See more details on using hashes here.

veomni 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

News

Overview

Table of Contents

Key Features

Upcoming Features

Getting Started

Installation

Quick Start

Merge checkpoints

Build Docker

Training Examples

Supported Models

Performance

Acknowledgement

Citation

Awesome work using VeOmni

Contribution Guide

About ByteDance Seed Team

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes