A registry-based, multi-GPU framework for reproducible image-unlearning evaluation.
Project description
⚡ SUPREME - A Multi-GPU Framework for Reproducible Image Unlearning Method Evaluation
🔬 Tech Stack
Core:
Accelerators:
Distributed & precision:
🛠️ Tooling
Experiment tracking:
Environment:
Debug & profile:
Code quality:
📖 Overview
SUPREME is an open-source framework for evaluating machine unlearning methods on image classification tasks at scale.
Machine unlearning removes the influence of a chosen subset of training data (a class, a sub-class, or a random sample) from an already-trained model, without retraining from scratch. A good unlearned model should behave as if it had never seen the forgotten data while still classifying everything else accurately. Comparing the many proposed methods fairly demands a standardised, repeatable harness, and SUPREME is that harness.
The gap it fills. Existing image-classification unlearning frameworks - MUBox, DeepUnlearn, and ERASURE - run on a single device, which caps how many methods, scenarios, and seeds can be evaluated in reasonable time. SUPREME distributes the entire train → unlearn → evaluate pipeline across multiple GPUs and nodes, removing that bottleneck. It does for image-classification unlearning what Open-Unlearning did for LLM unlearning in the text domain: turn a single-device research problem into a scalable, reproducible benchmark. To our knowledge it is the first multi-GPU framework for the field.
What it offers out of the box:
- A complete, automated pipeline. Train a baseline on the full dataset, unlearn the chosen subset with the selected method, then evaluate the result against a from-scratch retrained reference, all from one command. Re-runs detect and skip work that is already done.
- A broad component library. 5 datasets, 2 model architectures, 2 baselines, 9 unlearning methods, 9 evaluation metrics (covering forgetting, utility, privacy, behavioural/parametric equivalence, and efficiency), and 3 unlearning scenarios (full-class, subclass, random-sample), all selectable through command-line flags.
- Distributed, multi-precision execution. Built on PyTorch and Lightning Fabric. DDP, FSDP, and DeepSpeed ZeRO 1/2/3 apply to all three stages, with mixed precision (fp16 / bf16, FP8, 4-/8-bit) and CUDA / Apple Silicon (MPS) / TPU / CPU back-ends. SLURM helpers fan experiments out across a cluster.
- Statistically honest evaluation. A single random seed misrepresents how an unlearning method really behaves, because randomness enters at three independent points: training (weight initialisation and data shuffling produce different base models), unlearning (the unlearning algorithm itself is stochastic), and evaluation (sampling and metric computation add their own noise). SUPREME varies the seed at each of these three stages separately, so you can see how much of the spread in a result comes from the base model, from the unlearning run, and from measurement, and report the full distribution rather than a single point estimate. The seed count at each stage is configurable per run.
- Extensibility without forking. It is pip-installable (
pip install supreme-unlearning) and registry-based: add a dataset, model, method, or metric from your own package by implementing a small interface and registering its module path, with no edits to framework code (seedocs/extending.md). - Efficient reuse. Experiments that share a training configuration train the model once and reuse it, guarded by a file lock so parallel SLURM jobs and concurrent local runs stay consistent.
SUPREME evolved from the codebases of Selective Synaptic Dampening (SSD) and bad-teaching unlearning, generalising them from single-method, single-device scripts into a standardised, distributed evaluation platform.
For the formal pipeline algorithm and mathematical notation (seed formulas, set definitions, operation signatures), see supreme/README.md and docs/notation.md.
🗃️ Available Components
Registry-based components are user-extensible - implement the relevant interface and register the module path, either in-tree or from your own package (runtime API or packaging entry points, no edits to SUPREME). See docs/extending.md. The components provided via Lightning Fabric cover the supported hardware and execution configurations.
Registry-based (user-extensible)
| Component | Available implementations |
|---|---|
| Datasets | CIFAR-10, CIFAR-20, CIFAR-100, PinsFaceRecognition, Caltech-101 |
| Models | ResNet18, Vision Transformer (ViT) |
| Baselines | Retrain, Original |
| Unlearning methods | Fine-Tuning (FT), Bad Teacher (BadT), Random Labels (RL), UNSIR, SSD, LFSSD, ASSD, SCRUB, JIT |
| Evaluation metrics | Accuracy, Loss/Error, ZRF, Activation Distance, JS-Divergence, Layer-wise Distance, Membership Inference Attack, Completeness, Resource Consumption, Time |
| Unlearning scenarios | Full-class, Subclass, Random sample |
Provided via Lightning Fabric
| Component | Available implementations |
|---|---|
| Accelerators | CPU, CUDA, MPS, TPU |
| Precision modes | 64-true, 32-true, 16-mixed, bf16-mixed, 16-true, bf16-true, transformer-engine, transformer-engine-float16 (FP8), nf4, nf4-dq, fp4, fp4-dq, int8, int8-training |
| Distributed strategies | DDP, FSDP, DeepSpeed (ZeRO Stage 1/2/3) |
| Loggers | Weights & Biases, TensorBoard, CSV |
⚡ Quickstart
# 1. Clone
git clone https://github.com/pedroandreou/supreme-unlearning.git
cd supreme-unlearning
# 2. Set up environment - the Makefile is the entry point for local dev: it creates
# the venv (named `unlearning` by default; override with VENV=<name>), installs the
# pinned deps + SUPREME (editable), and enables the git hook. (Prompts if it
# already exists; pass ON_EXISTING=reuse to skip.)
make cuda # NVIDIA GPU (Linux / WSL2). Apple Silicon / CPU: `make mps`
source unlearning/bin/activate
# 3. Configure W&B + HF tokens
cp .env.example .env
# edit .env with your WANDB_KEY, WANDB_USERNAME and HUGGING_FACE_HUB_TOKEN
# 4. Smoke test - one seed, one method, one dataset
bash supreme/run_local.sh \
--gpu 0 --models ViT --training-seeds 260 \
--methods retrain,finetune,ssd \
--strategies random_ --datasets Cifar10 \
--forget-percs 0.01
Full environment setup (Docker Dev Container, MPS prerequisites, etc.) is documented in docs/environment_setup.md. The Docker image is NVIDIA-only (Linux / WSL2); macOS users follow the virtual-env path above.
🧪 Running Experiments
The pipeline runs train → unlearn → evaluate automatically. Re-running is safe: per-stage outputs (training checkpoints, unlearning checkpoints, already-logged W&B results) are detected and skipped.
Local (workstation, GPU server, interactive cluster node)
# All 10 seeds, all methods, all datasets - defaults
bash supreme/run_local.sh --gpu 0
# Filter the sweep
bash supreme/run_local.sh \
--gpu 0,1 \
--models ViT \
--training-seeds 260,261,262 \
--methods retrain,finetune,bad_teacher,ssd \
--strategies fullclass,random_ \
--datasets PinsFaceRecognition
| Flag | Description | Default |
|---|---|---|
--gpu |
GPU ID(s) - 0 single, 0,1,2,3 multi-GPU |
0 |
--models |
ResNet18, ViT |
both |
--training-seeds |
Comma-separated training seeds (outer loop, I). |
260–269 |
--unlearning-seeds |
Space-separated indices for J (e.g. "0 1 2" for J=3) |
"0" (matched) |
--evaluation-seeds |
Space-separated indices for K |
"0" (matched) |
--methods |
Unlearning methods to run | all 11 (2 baselines + 9 methods) |
--strategies |
fullclass, subclass, random_ |
all |
--datasets |
Datasets to use | all 5 |
--forget-percs |
Forget % for random_ strategy |
0.001–0.10 |
SLURM (HPC, login node)
# Preview the grid (no submission)
./supreme/run_slurm.sh --dry-run
# Submit all experiments, max 12 concurrent jobs
./supreme/run_slurm.sh --max-concurrent 12
# Subset
./supreme/run_slurm.sh \
--datasets Cifar10,Cifar20 \
--models ViT \
--training-seeds 260,261,262
# Multi-GPU DDP per job
./supreme/run_slurm.sh --gpus 4
Each submitted job runs one (seed, dataset, model) cell independently; cells run in parallel across the cluster. Distributed-strategy selection (DDP / FSDP / DeepSpeed) is documented in docs/implementation_notes.md → Distributed Strategies.
🔁 Reproducing the paper
Reproducing the paper's numbers is a two-step process: run the experiment grid on Pins Face Recognition (both architectures, both scenarios, all 10 seeds) and then render the three paper LaTeX tables from the W&B-logged results using supreme/utils/wandb_utils/results_analysis/pins_paper_tables.ipynb. The exact command, the table-rendering workflow, and the troubleshooting notes are documented in docs/reproducing_the_paper.md. For a runnable, step-by-step walkthrough (install → smoke test → full grid → tables → extending), see the notebook notebooks/reproduce_experiments.ipynb.
➕ Extending SUPREME
SUPREME is pip-installable (pip install supreme-unlearning, imported as
supreme) and reusable as a library.
Register your own components from your own package - no edits to framework code:
import supreme
supreme.register_unlearning_method("mymethod", "my_pkg.mymethod")
supreme.register_model("MyNet", "my_pkg.models:MyNet")
supreme.register_dataset("MyDS", "my_pkg.data:MyDS",
root="/data/myds", class_dict={"cat": 0, "dog": 1})
supreme.run_unlearning(["-method", "mymethod", "-net", "MyNet",
"-dataset", "MyDS", "-seed", "260"])
A runnable, end-to-end walkthrough from an external user's point of view -
pip install supreme-unlearning then register your own method/metric/model/dataset - is in
the notebook notebooks/custom_components.ipynb.
An installed plugin package can equivalently provide components via packaging
entry points (supreme.models, supreme.unlearning_methods, supreme.metrics,
supreme.plugins). The public API is supreme.register_*, supreme.run_training,
supreme.run_unlearning, and supreme.project_config; everything under
supreme.utils.* is internal.
Adding a dataset, model, method, or metric follows a consistent register-and-implement pattern. Walkthroughs and Fabric-integration rules live in docs/extending.md:
| What to add | Walkthrough |
|---|---|
| New dataset | docs/extending.md → Adding a new dataset |
| New model | docs/extending.md → Adding a new model |
| New unlearning method | docs/extending.md → Adding a new unlearning method |
| New evaluation metric | docs/extending.md → Adding a new evaluation metric |
🤝 Contributing
Contributions are welcome - bug reports, new components, and documentation alike.
- Found a bug or want a feature? Open an issue - the bug-report and feature-request templates appear automatically at New issue → choose a template.
- Adding a dataset, model, method, or metric? Most components register from
your own package with no framework edits - see
docs/extending.md. You can ship it as apip-installable plugin or upstream it via a pull request. - Opening a pull request? Run
make stylethenmake quality(the samerufflint + format checks CI runs), and follow the PR template. Full workflow in the contributing guide. - Share your method and results in
community/and add a row to the leaderboard.
CI (.github/workflows/ci.yml) lints, format-checks,
and validates the package build on every push and PR. A version tag like v0.1.0
triggers .github/workflows/publish.yml to build
and publish the release to PyPI (a manual run targets TestPyPI as a dry-run). The
CUDA images are published to GHCR manually via .github/workflows/docker.yml
(runtime image) and .github/workflows/devcontainer.yml
(prebuilt dev container). Notable changes per release are tracked in CHANGELOG.md.
📚 Documentation
| Document | Covers |
|---|---|
docs/contributing.md |
How to report issues, add components, and open a pull request |
CHANGELOG.md |
Notable changes per release (Keep a Changelog / SemVer) |
community/ |
Community-contributed methods, templates, and the results leaderboard |
docs/notation.md |
Symbol glossary - seeds, datasets, models, indices, counts |
supreme/README.md |
Formal algorithm specification (matched and decoupled protocols) |
docs/environment_setup.md |
Virtual-env and Docker Dev Container setup, .env template, prerequisites |
docs/reproducing_the_paper.md |
Single command for the paper's experiment grid plus the W&B-export-to-LaTeX-tables workflow |
docs/script_arguments.md |
Full argument reference for train_main.py and unlearn_main.py |
docs/extending.md |
How to add new datasets, models, methods, and metrics |
docs/tooling.md |
Debugger, profiler, Fabric callbacks, process tracker, split export, W&B exporter |
docs/wandb_integration.md |
W&B runtime behaviour: rank-0 logging, offline mode, sync workflow, metric synchronisation |
docs/wandb_fields.md |
Paper-to-W&B metric mapping and per-metric field paths |
docs/implementation_notes.md |
Distributed strategies, gradient handling, batch-size scaling, memory, known limitations |
docs/adding_pinsfacerecognition.md |
Manual Kaggle download for the Pins Face Recognition dataset |
docs/future_work.md |
Planned extensions |
📝 Citing this work
@misc{supreme2026,
title = {SUPREME: A Multi-GPU Framework for Reproducible Image Unlearning Method Evaluation},
author = {Petros Andreou, Jamie Lanyon, Axel Finke, Georgina Cosma},
year = {2026},
eprint = {2606.00380},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2606.00380}
}
This work was conducted at Loughborough University.
📄 License
Released under the MIT License.
⭐ Star History
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file supreme_unlearning-0.1.1.tar.gz.
File metadata
- Download URL: supreme_unlearning-0.1.1.tar.gz
- Upload date:
- Size: 165.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4102379c82f69b832fbe90668227012b33c601155376a112cd4650413ffb7ea3
|
|
| MD5 |
29137a1685258bdcb89082e23d734f6d
|
|
| BLAKE2b-256 |
95b01c0ea10183913ff1d28a50991b846b263bc3f9fe16f5b9cf733cadee0e37
|
Provenance
The following attestation bundles were made for supreme_unlearning-0.1.1.tar.gz:
Publisher:
publish.yml on pedroandreou/supreme-unlearning
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
supreme_unlearning-0.1.1.tar.gz -
Subject digest:
4102379c82f69b832fbe90668227012b33c601155376a112cd4650413ffb7ea3 - Sigstore transparency entry: 1703900575
- Sigstore integration time:
-
Permalink:
pedroandreou/supreme-unlearning@e84e6541f72486b107ca522a9e435562389a6497 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/pedroandreou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e84e6541f72486b107ca522a9e435562389a6497 -
Trigger Event:
push
-
Statement type:
File details
Details for the file supreme_unlearning-0.1.1-py3-none-any.whl.
File metadata
- Download URL: supreme_unlearning-0.1.1-py3-none-any.whl
- Upload date:
- Size: 183.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99a36ecc856c18d180880fefc1f73a97984e0401809a2621771f77f9f3f5f4d3
|
|
| MD5 |
bf1b0a956645682778f85761b7fa60e7
|
|
| BLAKE2b-256 |
4a6397cb00e95ac9854a2ab36b142ebb758797606146f30176781b278adf18e9
|
Provenance
The following attestation bundles were made for supreme_unlearning-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on pedroandreou/supreme-unlearning
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
supreme_unlearning-0.1.1-py3-none-any.whl -
Subject digest:
99a36ecc856c18d180880fefc1f73a97984e0401809a2621771f77f9f3f5f4d3 - Sigstore transparency entry: 1703900609
- Sigstore integration time:
-
Permalink:
pedroandreou/supreme-unlearning@e84e6541f72486b107ca522a9e435562389a6497 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/pedroandreou
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e84e6541f72486b107ca522a9e435562389a6497 -
Trigger Event:
push
-
Statement type: