ELF: PyTorch Framework for Memory-Aware Pipeline Parallel Training
Project description
ELF – Efficient Deep Learning Framework
ELF is a research-oriented framework built on top of PyTorch that makes training very large neural networks on multi-GPU setups effortless. It automates everything that usually hurts when scaling a model beyond the memory of a single GPU: graph extraction, partitioning, device placement, scheduling, communication, and memory optimisation – all exposed through a single high-level API.
Highlights
- One-line pipeline parallelism – wrap (almost) any
torch.nn.Moduleinsideelf.Pipelineand train it across any number of GPUs. - Automatic model partitioning – integrates different model splitting algorithms, and respects manual splits when you prefer full control.
- Static schedule zoo – GPipe, 1F1B, Hanayo, Zero-Bubble family, full-remat and inference-only variants.
- Data + pipeline parallelism – mix pipeline stages (
pp) with data-parallel replicas (dp) in the same job. - Fine-grained rematerialisation control – inject your own policy to trade memory for extra compute, or use ILP-based optimization to fit your budget.
- Plugin registries – add new schedulers, partitioners or tracers without touching the core code.
Installation
Install ELF from the repository for local development or use.
# Clone the repository
git clone https://github.com/topal-team/elf.git
cd elf
# Create and activate a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate
# Install ELF in editable mode
pip install -e .
Optional extras (install from PyPI when you add the extra):
- Dev — pytest and ruff (tests and linting). Ruff uses the repo’s
ruff.tomlfor config:pip install -e ".[dev]" - Docs — Sphinx and Read the Docs theme (build docs):
pip install -e ".[docs]"(On zsh, quote the package spec so brackets are not interpreted as globs)
Quick start
import torch
from elf import Pipeline # main entry point
torch.distributed.init_process_group("nccl")
model = MyBigModel()
sample = torch.randn(input_shape, device='cuda') # only needed for profiling
inputs = ...
targets = ...
pipe = Pipeline(model, sample) # pass placement / partitioner / scheduler as needed
loss_fn = torch.nn.CrossEntropyLoss()
y, loss = pipe(inputs, targets, loss_fn) # forward + backward (+ DP gradient sync)
# usual optimizer step
optimizer.step()
pipe.zero_grad()
Call pipe.clear() once you are done to gracefully destroy the underlying process-groups.
Some examples can be found under examples/ for more details and use cases.
The Pipeline API
Pipeline is a wrapper around your nn.Module. The most useful kwargs are:
- placement – list of CUDA ranks (or
"auto") describing where each stage runs. - partitioner – registry key or callable used to cut the graph (set to
Falseif you already partitioned the model yourself). - scheduler – registry key or callable that returns a static list of operations for every micro-batch.
- dp – integer giving the data-parallel replication factor.
- memory_budget - maximum amount of memory that should be used, per GPU, during training. This includes model parameters, activations and gradients, but not optimizer states or anything else.
The full argument list is defined in the documentation.
Registries: plug & play algorithms
ELF exposes three global registries in elf.registry:
from elf.registry import SCHEDULERS, COMM_SCHEDULERS, PARTITIONERS, TRACERS
Register a new component by key:
def my_partitioner(graph, times, memories, n_parts):
...
PARTITIONERS.register("my_algo", my_partitioner, description="Algo from paper ...")
Then simply reference it when building a pipeline: Pipeline(..., partitioner="my_algo").
The signature of functions expected in the registry are detailed in elf/registry.py
Process topologies
Placement.default(scheduler, pp) gives a good default mapping, but you can pass any explicit list, enabling exotic layouts such as:
placement = [0,1,2,3, 3,2,1,0] # bidirectional pipeline for Hanayo / ZBV
Environment variables
ELF_TIMINGS: Accurate time measurements indetailed_statsfield of thePipelineobject after an iteration. May affect performance.ELF_MEMORY: Accurate kept and peak memory measurements indetailed_statsfield of thePipelineobject after an iteration. May affect performance.ELF_TIMEOUT: Number of seconds to wait for before shutting down process groups. (passed to NCCL watchdog)
Docs
The full documentation can be generated with Sphinx. Go to docs/ and run make html.
Citation
If you use this project, please cite:
@article{aguila2025optimized,
title={Optimized Forward-Backward Rematerialization for Memory-Efficient Pipeline Parallel Training},
author={Aguila--Multner, Adrien and Beaumont, Olivier and Eyraud-Dubois, Lionel and Gusak, Julia},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file elf_pipeline-0.1.0.tar.gz.
File metadata
- Download URL: elf_pipeline-0.1.0.tar.gz
- Upload date:
- Size: 81.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6af9518f9750fa09747a00236e285b2bcfd7624fa791b561ee2e6e231213d169
|
|
| MD5 |
b5b7369823947be51894fb4dd9625e34
|
|
| BLAKE2b-256 |
d629fbb9d7219f0564d0f6b3f15cca04aa1928593d2f61d38995e43c3104351e
|
File details
Details for the file elf_pipeline-0.1.0-py3-none-any.whl.
File metadata
- Download URL: elf_pipeline-0.1.0-py3-none-any.whl
- Upload date:
- Size: 88.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54db7a8ff17f11a2c4414ccc4f353a85a17435eeae9d5d09161bb6e7a4d8b9a9
|
|
| MD5 |
7062e4e3ed9ae18dc3bf45af12ee806c
|
|
| BLAKE2b-256 |
a844da23c8ba600a305011ed0635b3dd14b6ae7d7a4e2ea5bdaa907c9ebcab21
|