supreme-unlearning

A registry-based, multi-GPU framework for reproducible image-unlearning evaluation.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

petros44

These details have not been verified by PyPI

Project description

⚡ SUPREME - Standardised Unlearning Platform for REproducible Method Evaluation

SUPREME

🔬 Tech Stack
Core:
Accelerators:
Distributed & precision:

🛠️ Tooling
Experiment tracking:
Environment:
Debug & profile:
Code quality:

📄 Publication

📦 Repository

📖 Overview

SUPREME is an open-source framework for evaluating machine unlearning methods on image classification tasks.

What is machine unlearning? Given a model that was trained on some data, machine unlearning removes the influence of a chosen subset of that data (a class, a sub-class, or a random sample of examples) without retraining the model from scratch. Doing this well is hard: a good unlearned model should behave as if it had never seen the forgotten data, while still classifying everything else accurately. Many methods have been proposed, and they need a fair, repeatable way to be compared.

What SUPREME does. It runs the same three-stage pipeline end to end for any registered combination of dataset, model, unlearning method, and evaluation metric:

Train a baseline model on the full dataset.
Unlearn the chosen subset using the selected unlearning method.
Evaluate the unlearned model against a from-scratch retrained baseline (trained only on the data that was kept), using a configurable set of metrics that cover forgetting, utility, privacy, behavioural/parametric equivalence, and efficiency.

It ships 5 datasets, 2 model architectures, 2 baselines, 9 unlearning methods, and 9 selectable evaluation metrics (plus loss, reported automatically alongside accuracy), all selectable through command-line flags.

What makes SUPREME different:

Reproducible. Recent work has shown that single-seed unlearning results can misrepresent a method's true behaviour. SUPREME runs the same experiment under multiple seeds, independently for the training, unlearning, and evaluation stages, so you measure distributions, not point estimates. The number of seeds at each stage is configurable per run.
Multi-GPU and multi-precision. Built on PyTorch and Lightning Fabric. Distribution (DDP, FSDP, DeepSpeed ZeRO 1/2/3) applies to all three stages, with mixed-precision (fp16 / bf16) and NVIDIA / Apple Silicon / CPU back-ends.
Registry-based extensibility. Add a dataset, model, unlearning method, or metric by implementing a small interface and registering its module path, with no framework changes required (see docs/extending.md).
Efficient. When several experiments share the same training configuration, the model is trained once and reused across them, guarded by a file lock so parallel SLURM jobs and concurrent local runs stay consistent.

For the formal pipeline algorithm and mathematical notation (seed formulas, set definitions, operation signatures), see supreme/README.md and docs/notation.md.

🗃️ Available Components

Registry-based components are user-extensible - implement the relevant interface and register the module path, either in-tree or from your own package (runtime API or packaging entry points, no edits to SUPREME). See docs/extending.md. The components provided via Lightning Fabric cover the supported hardware and execution configurations.

Registry-based (user-extensible)

Component	Available implementations
Datasets	CIFAR-10, CIFAR-20, CIFAR-100, PinsFaceRecognition, Caltech-101
Models	ResNet18, Vision Transformer (ViT)
Baselines	Retrain, Original
Unlearning methods	Fine-Tuning (FT), Bad Teacher (BadT), Random Labels (RL), UNSIR, SSD, LFSSD, ASSD, SCRUB, JIT
Evaluation metrics	Accuracy, Loss/Error, ZRF, Activation Distance, JS-Divergence, Layer-wise Distance, Membership Inference Attack, Completeness, Resource Consumption, Time
Unlearning scenarios	Full-class, Subclass, Random sample

Provided via Lightning Fabric

Component	Available implementations
Accelerators	CPU, CUDA, MPS, TPU
Precision modes	64-true, 32-true, 16-mixed, bf16-mixed, 16-true, bf16-true, transformer-engine, transformer-engine-float16 (FP8), nf4, nf4-dq, fp4, fp4-dq, int8, int8-training
Distributed strategies	DDP, FSDP, DeepSpeed (ZeRO Stage 1/2/3)
Loggers	Weights & Biases, TensorBoard, CSV

⚡ Quickstart

# 1. Clone
git clone https://github.com/pedroandreou/supreme-unlearning.git
cd supreme-unlearning

# 2. Set up environment - the Makefile is the entry point for local dev: it creates
#    the venv (named `unlearning` by default; override with VENV=<name>), installs the
#    pinned deps + SUPREME (editable), and enables the git hook. (Prompts if it
#    already exists; pass ON_EXISTING=reuse to skip.)
make cuda                  # NVIDIA GPU (Linux / WSL2).  Apple Silicon / CPU: `make mps`
source unlearning/bin/activate

# 3. Configure W&B + HF tokens
cp .env.example .env
# edit .env with your WANDB_KEY, WANDB_USERNAME and HUGGING_FACE_HUB_TOKEN

# 4. Smoke test - one seed, one method, one dataset
bash supreme/run_local.sh \
  --gpu 0 --models ViT --training-seeds 260 \
  --methods retrain,finetune,ssd \
  --strategies random_ --datasets Cifar10 \
  --forget-percs 0.01

Full environment setup (Docker Dev Container, MPS prerequisites, etc.) is documented in docs/environment_setup.md. The Docker image is NVIDIA-only (Linux / WSL2); macOS users follow the virtual-env path above.

🧪 Running Experiments

The pipeline runs train → unlearn → evaluate automatically. Re-running is safe: per-stage outputs (training checkpoints, unlearning checkpoints, already-logged W&B results) are detected and skipped.

Local (workstation, GPU server, interactive cluster node)

# All 10 seeds, all methods, all datasets - defaults
bash supreme/run_local.sh --gpu 0

# Filter the sweep
bash supreme/run_local.sh \
  --gpu 0,1 \
  --models ViT \
  --training-seeds 260,261,262 \
  --methods retrain,finetune,bad_teacher,ssd \
  --strategies fullclass,random_ \
  --datasets PinsFaceRecognition

Flag	Description	Default
`--gpu`	GPU ID(s) - `0` single, `0,1,2,3` multi-GPU	`0`
`--models`	`ResNet18`, `ViT`	both
`--training-seeds`	Comma-separated training seeds (outer loop, `I`).	`260`–`269`
`--unlearning-seeds`	Space-separated indices for `J` (e.g. `"0 1 2"` for `J=3`)	`"0"` (matched)
`--evaluation-seeds`	Space-separated indices for `K`	`"0"` (matched)
`--methods`	Unlearning methods to run	all 11 (2 baselines + 9 methods)
`--strategies`	`fullclass`, `subclass`, `random_`	all
`--datasets`	Datasets to use	all 5
`--forget-percs`	Forget % for `random_` strategy	`0.001`–`0.10`

SLURM (HPC, login node)

# Preview the grid (no submission)
./supreme/run_slurm.sh --dry-run

# Submit all experiments, max 12 concurrent jobs
./supreme/run_slurm.sh --max-concurrent 12

# Subset
./supreme/run_slurm.sh \
  --datasets Cifar10,Cifar20 \
  --models ViT \
  --training-seeds 260,261,262

# Multi-GPU DDP per job
./supreme/run_slurm.sh --gpus 4

Each submitted job runs one (seed, dataset, model) cell independently; cells run in parallel across the cluster. Distributed-strategy selection (DDP / FSDP / DeepSpeed) is documented in docs/implementation_notes.md → Distributed Strategies.

🔁 Reproducing the paper

Reproducing the paper's numbers is a two-step process: run the experiment grid on Pins Face Recognition (both architectures, both scenarios, all 10 seeds) and then render the three paper LaTeX tables from the W&B-logged results using supreme/utils/wandb_utils/results_analysis/pins_paper_tables.ipynb. The exact command, the table-rendering workflow, and the troubleshooting notes are documented in docs/reproducing_the_paper.md. For a runnable, step-by-step walkthrough (install → smoke test → full grid → tables → extending), see the notebook notebooks/reproduce_experiments.ipynb.

➕ Extending SUPREME

SUPREME is pip-installable (pip install supreme-unlearning, imported as supreme) and reusable as a library. Register your own components from your own package - no edits to framework code:

import supreme

supreme.register_unlearning_method("mymethod", "my_pkg.mymethod")
supreme.register_model("MyNet", "my_pkg.models:MyNet")
supreme.register_dataset("MyDS", "my_pkg.data:MyDS",
                         root="/data/myds", class_dict={"cat": 0, "dog": 1})

supreme.run_unlearning(["-method", "mymethod", "-net", "MyNet",
                        "-dataset", "MyDS", "-seed", "260"])

A runnable, end-to-end walkthrough from an external user's point of view - pip install supreme-unlearning then register your own method/metric/model/dataset - is in the notebook notebooks/custom_components.ipynb.

An installed plugin package can equivalently provide components via packaging entry points (supreme.models, supreme.unlearning_methods, supreme.metrics, supreme.plugins). The public API is supreme.register_*, supreme.run_training, supreme.run_unlearning, and supreme.project_config; everything under supreme.utils.* is internal.

Adding a dataset, model, method, or metric follows a consistent register-and-implement pattern. Walkthroughs and Fabric-integration rules live in docs/extending.md:

What to add	Walkthrough
New dataset	`docs/extending.md → Adding a new dataset`
New model	`docs/extending.md → Adding a new model`
New unlearning method	`docs/extending.md → Adding a new unlearning method`
New evaluation metric	`docs/extending.md → Adding a new evaluation metric`

🤝 Contributing

Contributions are welcome - bug reports, new components, and documentation alike.

Found a bug or want a feature? Open an issue - the bug-report and feature-request templates appear automatically at New issue → choose a template.
Adding a dataset, model, method, or metric? Most components register from your own package with no framework edits - see docs/extending.md. You can ship it as a pip-installable plugin or upstream it via a pull request.
Opening a pull request? Run make style then make quality (the same ruff lint + format checks CI runs), and follow the PR template. Full workflow in the contributing guide.
Share your method and results in community/ and add a row to the leaderboard.

CI (.github/workflows/ci.yml) lints, format-checks, and validates the package build on every push and PR. A version tag like v0.1.0 triggers .github/workflows/publish.yml to build and publish the release to PyPI (a manual run targets TestPyPI as a dry-run), and .github/workflows/docker.yml builds the CUDA image to GHCR. Notable changes per release are tracked in CHANGELOG.md.

📚 Documentation

Document	Covers
`docs/contributing.md`	How to report issues, add components, and open a pull request
`CHANGELOG.md`	Notable changes per release (Keep a Changelog / SemVer)
`community/`	Community-contributed methods, templates, and the results leaderboard
`docs/notation.md`	Symbol glossary - seeds, datasets, models, indices, counts
`supreme/README.md`	Formal algorithm specification (matched and decoupled protocols)
`docs/environment_setup.md`	Virtual-env and Docker Dev Container setup, `.env` template, prerequisites
`docs/reproducing_the_paper.md`	Single command for the paper's experiment grid plus the W&B-export-to-LaTeX-tables workflow
`docs/script_arguments.md`	Full argument reference for `train_main.py` and `unlearn_main.py`
`docs/extending.md`	How to add new datasets, models, methods, and metrics
`docs/tooling.md`	Debugger, profiler, Fabric callbacks, process tracker, split export, W&B exporter
`docs/wandb_integration.md`	W&B runtime behaviour: rank-0 logging, offline mode, sync workflow, metric synchronisation
`docs/wandb_fields.md`	Paper-to-W&B metric mapping and per-metric field paths
`docs/implementation_notes.md`	Distributed strategies, gradient handling, batch-size scaling, memory, known limitations
`docs/adding_pinsfacerecognition.md`	Manual Kaggle download for the Pins Face Recognition dataset
`docs/future_work.md`	Planned extensions

📝 Citing this work

@misc{supreme2026,
  title  = {SUPREME: Standardised Unlearning Platform for REproducible Method Evaluation},
  author = {Petros Andreou, Jamie Lanyon, Axel Finke, Georgina Cosma},
  year   = {2026},
  eprint = {2606.00380},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url    = {https://arxiv.org/abs/2606.00380}
}

This work was conducted at Loughborough University.

📄 License

Released under the MIT License.

⭐ Star History

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

petros44

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Jun 4, 2026

0.1.2

Jun 2, 2026

0.1.1

Jun 2, 2026

This version

0.1.0

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

supreme_unlearning-0.1.0.tar.gz (164.3 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

supreme_unlearning-0.1.0-py3-none-any.whl (182.5 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file supreme_unlearning-0.1.0.tar.gz.

File metadata

Download URL: supreme_unlearning-0.1.0.tar.gz
Upload date: Jun 2, 2026
Size: 164.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for supreme_unlearning-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`89ca29ce4678d559c06af3fdabd3b26186abebf53e08aa76ad090c3c8de62c65`
MD5	`18d5685c952289e787f68b4c38f74878`
BLAKE2b-256	`86efd93f09fcf87129e93f90ee1e4a2fef7eec3a2fba64e882bf4873d7e663de`

See more details on using hashes here.

Provenance

The following attestation bundles were made for supreme_unlearning-0.1.0.tar.gz:

Publisher: publish.yml on pedroandreou/supreme-unlearning

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: supreme_unlearning-0.1.0.tar.gz
- Subject digest: 89ca29ce4678d559c06af3fdabd3b26186abebf53e08aa76ad090c3c8de62c65
- Sigstore transparency entry: 1702791348
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: pedroandreou/supreme-unlearning@7c1e5215f73f96e40e0318475a1c4a2ff8a5bfda
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pedroandreou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7c1e5215f73f96e40e0318475a1c4a2ff8a5bfda
- Trigger Event: push

File details

Details for the file supreme_unlearning-0.1.0-py3-none-any.whl.

File metadata

Download URL: supreme_unlearning-0.1.0-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 182.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for supreme_unlearning-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f85bd757738cc7e7c7cce9150fdbf92660152ed1a2d38a7af489652c9b50fe28`
MD5	`87b9b31e1231cd7ad8f83c67c81405ae`
BLAKE2b-256	`296d8639def4fc0138981b2be955018e9abdcf07d4bc059ad4cc2dc56e6b0c40`

See more details on using hashes here.

Provenance

The following attestation bundles were made for supreme_unlearning-0.1.0-py3-none-any.whl:

Publisher: publish.yml on pedroandreou/supreme-unlearning

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: supreme_unlearning-0.1.0-py3-none-any.whl
- Subject digest: f85bd757738cc7e7c7cce9150fdbf92660152ed1a2d38a7af489652c9b50fe28
- Sigstore transparency entry: 1702791390
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: pedroandreou/supreme-unlearning@7c1e5215f73f96e40e0318475a1c4a2ff8a5bfda
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/pedroandreou
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@7c1e5215f73f96e40e0318475a1c4a2ff8a5bfda
- Trigger Event: push

supreme-unlearning 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

⚡ SUPREME - Standardised Unlearning Platform for REproducible Method Evaluation

📖 Overview

🗃️ Available Components

Registry-based (user-extensible)

Provided via Lightning Fabric

⚡ Quickstart

🧪 Running Experiments

Local (workstation, GPU server, interactive cluster node)

SLURM (HPC, login node)

🔁 Reproducing the paper

➕ Extending SUPREME

🤝 Contributing

📚 Documentation

📝 Citing this work

📄 License

⭐ Star History

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance