ELF: PyTorch Framework for Memory-Aware Pipeline Parallel Training

These details have not been verified by PyPI

Project links

Repository

Project description

ELF – Efficient Deep Learning Framework

ELF is a research-oriented framework built on top of PyTorch that makes training very large neural networks on multi-GPU setups effortless. It automates everything that usually hurts when scaling a model beyond the memory of a single GPU: graph extraction, partitioning, device placement, scheduling, communication, and memory optimisation – all exposed through a single high-level API.

Highlights

One-line pipeline parallelism – wrap (almost) any torch.nn.Module inside elf.Pipeline and train it across any number of GPUs.
Automatic model partitioning – integrates different model splitting algorithms, and respects manual splits when you prefer full control.
Static schedule zoo – GPipe, 1F1B, Hanayo, Zero-Bubble family, full-remat and inference-only variants.
Data + pipeline parallelism – mix pipeline stages (pp) with data-parallel replicas (dp) in the same job.
Fine-grained rematerialisation control – inject your own policy to trade memory for extra compute, or use ILP-based optimization to fit your budget.
Plugin registries – add new schedulers, partitioners or tracers without touching the core code.

Installation

Install ELF from the repository for local development or use.

# Clone the repository
git clone https://github.com/topal-team/elf.git
cd elf

# Create and activate a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate

# Install ELF in editable mode
pip install -e .

Optional extras (install from PyPI when you add the extra):

Dev — pytest and ruff (tests and linting). Ruff uses the repo’s ruff.toml for config: pip install -e ".[dev]"
Docs — Sphinx and Read the Docs theme (build docs): pip install -e ".[docs]" (On zsh, quote the package spec so brackets are not interpreted as globs)

Quick start

import torch
from elf import Pipeline            # main entry point

torch.distributed.init_process_group("nccl")

model   = MyBigModel()
sample  = torch.randn(input_shape, device='cuda')   # only needed for profiling
inputs  = ...
targets = ...

pipe = Pipeline(model, sample)               # pass placement / partitioner / scheduler as needed

loss_fn = torch.nn.CrossEntropyLoss()
y, loss = pipe(inputs, targets, loss_fn)   # forward + backward (+ DP gradient sync)

# usual optimizer step
optimizer.step()
pipe.zero_grad()

Call pipe.clear() once you are done to gracefully destroy the underlying process-groups.
Some examples can be found under examples/ for more details and use cases.

The Pipeline API

Pipeline is a wrapper around your nn.Module. The most useful kwargs are:

placement – list of CUDA ranks (or "auto") describing where each stage runs.
partitioner – registry key or callable used to cut the graph (set to False if you already partitioned the model yourself).
scheduler – registry key or callable that returns a static list of operations for every micro-batch.
dp – integer giving the data-parallel replication factor.
memory_budget - maximum amount of memory that should be used, per GPU, during training. This includes model parameters, activations and gradients, but not optimizer states or anything else.

The full argument list is defined in the documentation.

Registries: plug & play algorithms

ELF exposes three global registries in elf.registry:

from elf.registry import SCHEDULERS, COMM_SCHEDULERS, PARTITIONERS, TRACERS

def my_partitioner(graph, times, memories, n_parts):
    ...

PARTITIONERS.register("my_algo", my_partitioner, description="Algo from paper ...")

Then simply reference it when building a pipeline: Pipeline(..., partitioner="my_algo").

The signature of functions expected in the registry are detailed in elf/registry.py

Process topologies

Placement.default(scheduler, pp) gives a good default mapping, but you can pass any explicit list, enabling exotic layouts such as:

placement = [0,1,2,3, 3,2,1,0]   # bidirectional pipeline for Hanayo / ZBV

Environment variables

ELF_TIMINGS: Accurate time measurements in detailed_stats field of the Pipeline object after an iteration. May affect performance.
ELF_MEMORY: Accurate kept and peak memory measurements in detailed_stats field of the Pipeline object after an iteration. May affect performance.
ELF_TIMEOUT: Number of seconds to wait for before shutting down process groups. (passed to NCCL watchdog)

Docs

The full documentation can be generated with Sphinx. Go to docs/ and run make html.

Citation

If you use this project, please cite:

@article{aguila2025optimized,
  title={Optimized Forward-Backward Rematerialization for Memory-Efficient Pipeline Parallel Training},
  author={Aguila--Multner, Adrien and Beaumont, Olivier and Eyraud-Dubois, Lionel and Gusak, Julia},
  year={2025}
}

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

0.1.0

Apr 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elf_pipeline-0.1.0.tar.gz (81.1 kB view details)

Uploaded Apr 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

elf_pipeline-0.1.0-py3-none-any.whl (88.8 kB view details)

Uploaded Apr 21, 2026 Python 3

File details

Details for the file elf_pipeline-0.1.0.tar.gz.

File metadata

Download URL: elf_pipeline-0.1.0.tar.gz
Upload date: Apr 21, 2026
Size: 81.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for elf_pipeline-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6af9518f9750fa09747a00236e285b2bcfd7624fa791b561ee2e6e231213d169`
MD5	`b5b7369823947be51894fb4dd9625e34`
BLAKE2b-256	`d629fbb9d7219f0564d0f6b3f15cca04aa1928593d2f61d38995e43c3104351e`

See more details on using hashes here.

File details

Details for the file elf_pipeline-0.1.0-py3-none-any.whl.

File metadata

Download URL: elf_pipeline-0.1.0-py3-none-any.whl
Upload date: Apr 21, 2026
Size: 88.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for elf_pipeline-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`54db7a8ff17f11a2c4414ccc4f353a85a17435eeae9d5d09161bb6e7a4d8b9a9`
MD5	`7062e4e3ed9ae18dc3bf45af12ee806c`
BLAKE2b-256	`a844da23c8ba600a305011ed0635b3dd14b6ae7d7a4e2ea5bdaa907c9ebcab21`

See more details on using hashes here.

elf-pipeline 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ELF – Efficient Deep Learning Framework

Highlights

Installation

Quick start

The Pipeline API

Registries: plug & play algorithms

Process topologies

Environment variables

Docs

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes