Skip to main content

ELF: PyTorch Framework for Memory-Aware Pipeline Parallel Training

Project description

ELF – Efficient Deep Learning Framework

ELF is a research-oriented framework built on top of PyTorch that makes training very large neural networks on multi-GPU setups effortless. It automates everything that usually hurts when scaling a model beyond the memory of a single GPU: graph extraction, partitioning, device placement, scheduling, communication, and memory optimisation – all exposed through a single high-level API.

Highlights

  • One-line pipeline parallelism – wrap (almost) any torch.nn.Module inside elf.Pipeline and train it across any number of GPUs.
  • Automatic model partitioning – integrates different model splitting algorithms, and respects manual splits when you prefer full control.
  • Static schedule zoo – GPipe, 1F1B, Hanayo, Zero-Bubble family, full-remat and inference-only variants.
  • Data + pipeline parallelism – mix pipeline stages (pp) with data-parallel replicas (dp) in the same job.
  • Fine-grained rematerialisation control – inject your own policy to trade memory for extra compute, or use ILP-based optimization to fit your budget.
  • Plugin registries – add new schedulers, partitioners or tracers without touching the core code.

Installation

Install ELF from the repository for local development or use.

# Clone the repository
git clone https://github.com/topal-team/elf.git
cd elf

# Create and activate a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate

# Install ELF in editable mode
pip install -e .

Optional extras (install from PyPI when you add the extra):

  • Dev — pytest and ruff (tests and linting). Ruff uses the repo’s ruff.toml for config: pip install -e ".[dev]"
  • Docs — Sphinx and Read the Docs theme (build docs): pip install -e ".[docs]" (On zsh, quote the package spec so brackets are not interpreted as globs)

Quick start

import torch
from elf import Pipeline            # main entry point

torch.distributed.init_process_group("nccl")

model   = MyBigModel()
sample  = torch.randn(input_shape, device='cuda')   # only needed for profiling
inputs  = ...
targets = ...

pipe = Pipeline(model, sample)               # pass placement / partitioner / scheduler as needed

loss_fn = torch.nn.CrossEntropyLoss()
y, loss = pipe(inputs, targets, loss_fn)   # forward + backward (+ DP gradient sync)

# usual optimizer step
optimizer.step()
pipe.zero_grad()

Call pipe.clear() once you are done to gracefully destroy the underlying process-groups.
Some examples can be found under examples/ for more details and use cases.

The Pipeline API

Pipeline is a wrapper around your nn.Module. The most useful kwargs are:

  • placement – list of CUDA ranks (or "auto") describing where each stage runs.
  • partitioner – registry key or callable used to cut the graph (set to False if you already partitioned the model yourself).
  • scheduler – registry key or callable that returns a static list of operations for every micro-batch.
  • dp – integer giving the data-parallel replication factor.
  • memory_budget - maximum amount of memory that should be used, per GPU, during training. This includes model parameters, activations and gradients, but not optimizer states or anything else.

The full argument list is defined in the documentation.

Registries: plug & play algorithms

ELF exposes three global registries in elf.registry:

from elf.registry import SCHEDULERS, COMM_SCHEDULERS, PARTITIONERS, TRACERS

Register a new component by key:

def my_partitioner(graph, times, memories, n_parts):
    ...

PARTITIONERS.register("my_algo", my_partitioner, description="Algo from paper ...")

Then simply reference it when building a pipeline: Pipeline(..., partitioner="my_algo").

The signature of functions expected in the registry are detailed in elf/registry.py

Process topologies

Placement.default(scheduler, pp) gives a good default mapping, but you can pass any explicit list, enabling exotic layouts such as:

placement = [0,1,2,3, 3,2,1,0]   # bidirectional pipeline for Hanayo / ZBV

Environment variables

  • ELF_TIMINGS: Accurate time measurements in detailed_stats field of the Pipeline object after an iteration. May affect performance.
  • ELF_MEMORY: Accurate kept and peak memory measurements in detailed_stats field of the Pipeline object after an iteration. May affect performance.
  • ELF_TIMEOUT: Number of seconds to wait for before shutting down process groups. (passed to NCCL watchdog)

Docs

The full documentation can be generated with Sphinx. Go to docs/ and run make html.

Citation

If you use this project, please cite:

@article{aguila2025optimized,
  title={Optimized Forward-Backward Rematerialization for Memory-Efficient Pipeline Parallel Training},
  author={Aguila--Multner, Adrien and Beaumont, Olivier and Eyraud-Dubois, Lionel and Gusak, Julia},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elf_pipeline-0.1.0.tar.gz (81.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elf_pipeline-0.1.0-py3-none-any.whl (88.8 kB view details)

Uploaded Python 3

File details

Details for the file elf_pipeline-0.1.0.tar.gz.

File metadata

  • Download URL: elf_pipeline-0.1.0.tar.gz
  • Upload date:
  • Size: 81.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for elf_pipeline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6af9518f9750fa09747a00236e285b2bcfd7624fa791b561ee2e6e231213d169
MD5 b5b7369823947be51894fb4dd9625e34
BLAKE2b-256 d629fbb9d7219f0564d0f6b3f15cca04aa1928593d2f61d38995e43c3104351e

See more details on using hashes here.

File details

Details for the file elf_pipeline-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: elf_pipeline-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 88.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for elf_pipeline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 54db7a8ff17f11a2c4414ccc4f353a85a17435eeae9d5d09161bb6e7a4d8b9a9
MD5 7062e4e3ed9ae18dc3bf45af12ee806c
BLAKE2b-256 a844da23c8ba600a305011ed0635b3dd14b6ae7d7a4e2ea5bdaa907c9ebcab21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page