Skip to main content

Unlearning Algorithms

Project description

Unlearn

Unlearn is an open-source Python package designed to streamline the development of unlearning algorithms and establish a standardized evaluation pipeline for diffusion models. It provides researchers and practitioners with tools to implement, evaluate, and extend unlearning algorithms effectively.

Features

  • Comprehensive Algorithm Support: Includes commonly used concept erasing and machine unlearning algorithms tailored for diffusion models. Each algorithm is encapsulated and standardized in terms of input-output formats.

  • Automated Evaluation: Supports automatic evaluation on datasets like UnlearnCanvas or IP2P. Performs standard and adversarial evaluations, outputting metrics as detailed in UnlearnCanvas and UnlearnDiffAtk.

  • Extensibility: Designed for easy integration of new unlearning algorithms, attack methods, defense mechanisms, and datasets with minimal modifications.

Supported Algorithms

The initial version includes established methods benchmarked in UnlearnCanvas and defensive unlearning techniques:

  • ESD (Efficient Substitution Distillation)
  • CA
  • UCE
  • FMN
  • SalUn
  • SEOT
  • SPM
  • EDiff
  • ScissorHands
  • ...and more

For detailed information on each algorithm, please refer to the respective README.md files located inside mu/algorithms.

Project Architecture

The project is organized to facilitate scalability and maintainability.

  • data/: Stores data-related files.

    • processed_data/: Preprocessed data ready for models.
    • raw_data/: Original datasets.
    • results/: Outputs from algorithms.
      • esd/: Results specific to the ESD algorithm.
      • algorithm_2/: Results from other algorithms.
    • images/: Generated or processed images.
    • models/: Saved model checkpoints.
  • docs/: Documentation, including API references and user guides.

  • examples/: Sample code and notebooks demonstrating usage.

  • logs/: Log files for debugging and auditing.

  • models/: Repository of trained models and checkpoints.

  • mu/: Core source code.

    • algorithms/: Implementation of various algorithms. Each algorithm has its own subdirectory containing code and a README.md with detailed documentation.
      • esd/: ESD algorithm components.
        • README.md: Documentation specific to the ESD algorithm.
        • algorithm.py: Core implementation of ESD.
        • configs/: Configuration files for training and generation tasks.
        • constants/const.py: Constant values used across the ESD algorithm.
        • environment.yaml: Environment setup for ESD.
        • model.py: Model architectures specific to ESD.
        • sampler.py: Sampling methods used during training or inference.
        • scripts/train.py: Training script for ESD.
        • trainer.py: Training routines and optimization strategies.
        • utils.py: Utility functions and helpers.
      • ca/: Components for the CA algorithm.
        • README.md: Documentation specific to the CA algorithm.
        • ...and so on for other algorithms
    • core/: Foundational classes and utilities.
      • base_algorithm.py: Abstract base class for algorithm implementations.
      • base_data_handler.py: Base class for data handling.
      • base_model.py: Base class for model definitions.
      • base_sampler.py: Base class for sampling methods.
      • base_trainer.py: Base class for training routines.
    • datasets/: Dataset management and utilities.
      • __init__.py: Initializes the dataset package.
      • dataset.py: Dataset classes and methods.
      • helpers/: Helper functions for data processing.
      • unlearning_canvas_dataset.py: Specific dataset class for unlearning tasks.
    • helpers/: Utility functions and helpers.
      • helper.py: General-purpose helper functions.
      • logger.py: Logging utilities to standardize logging practices.
      • path_setup.py: Path configurations and environment setup.
  • tests/: Test suites for ensuring code reliability.

Datasets

We use the Quick Canvas benchmark dataset, available here. Currently, the algorithms are trained using 5 images belonging to the themes of Abstractionism and Architectures.

Usage

This section contains the usage guide for the package.

Prerequisities

Ensure conda is installed on your system. You can install Miniconda or Anaconda:

After installing conda, ensure it is available in your PATH by running:

conda --version

Downloading data and models.

After you install the package, you can use following commands to download.

  1. Dataset
    <dataset_type> : sample | full
    <dataset_source>: i2p | quick_canvas
download_data <dataset_type> <dataset_source>

eg: downlaod_data sample i2p

  1. Model
    <model_type> : compvis | diffuser
download_model <model_type>

eg: download_model compvis

  1. Run Train
    Each algorithm has their own script to run the algorithm, Some also have different process all together. Follow readme for the algorithm you want to run from this repository. You will need to create a train_config and model_config file to run this.

Here is an example for Erase_diff algorithm.

  1. train_config
  2. model_config
  3. Usage Link
WANDB_MODE=offline python -m mu.algorithms.erase_diff.scripts.train \
--config_path mu/algorithms/erase_diff/configs/train_config.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unlearn_diff-1.0.0.tar.gz (17.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unlearn_diff-1.0.0-py3-none-any.whl (17.9 MB view details)

Uploaded Python 3

File details

Details for the file unlearn_diff-1.0.0.tar.gz.

File metadata

  • Download URL: unlearn_diff-1.0.0.tar.gz
  • Upload date:
  • Size: 17.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for unlearn_diff-1.0.0.tar.gz
Algorithm Hash digest
SHA256 dcb5a1d6fe5ce926d91622c0cfa2a9fa87c77f35126c9b584a991ceb2e6977b3
MD5 c3fcf87b5d5bd1d629c97a2e6b71449e
BLAKE2b-256 2e57744dd8d1747985c9d091f64ef071b75ac777a910f7c87bf91126de556447

See more details on using hashes here.

File details

Details for the file unlearn_diff-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: unlearn_diff-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for unlearn_diff-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 018d12f1152b6a5138497dda8acc0134d3adef89cf427623fa04d55484d6a38c
MD5 ac0e384ef7d8f06107a1189c059ed54a
BLAKE2b-256 af54b54ae98663d44241967f00ffc98287e7535faec63996690835bc67c2ae16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page