Skip to main content

A Python library for Machine Unlearning

Project description

PyBrush

Introduction

PyBrush is an open-source Python library designed for Machine Unlearning, enabling efficient removal of specific data influences from trained machine learning models. Inspired by the principles of model interpretability, privacy preservation, and compliance with legal frameworks like GDPR's "Right to be Forgotten," PyBrush provides state-of-the-art tools for exact, approximate, amortized, and certified unlearning.

PyBrush is built with scalability and usability in mind, supporting both traditional ML models and deep learning frameworks like TensorFlow and PyTorch.


Key Features

  • Multiple Unlearning Techniques: Support for Exact Unlearning, Approximate Unlearning, Amortized Unlearning, and Certified Unlearning.
  • Integration with Existing ML Frameworks: Seamlessly works with scikit-learn, TensorFlow, and PyTorch.
  • User-Friendly API: Simple, Keras-like API for rapid implementation.
  • Privacy Compliance: Helps AI models adhere to data protection regulations.
  • Open-Source and Extensible: Community-driven, with a modular architecture for extending functionality.

Installation

You can install PyBrush via pip:

pip install pybrush

Alternatively, install directly from the GitHub repository:

git clone https://github.com/pybrush/pybrush.git
cd pybrush
pip install .

Usage

Import PyBrush

from pybrush import core

Example: Exact Unlearning for a Logistic Regression Model

from pybrush.unlearning import ExactUnlearning
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20)

# Train initial model
model = LogisticRegression()
model.fit(X, y)

# Forget a specific data point
unlearning = ExactUnlearning(model)
X_forget, y_forget = X[0:10], y[0:10]
new_model = unlearning.unlearn(X_forget, y_forget)

Example: Approximate Unlearning in a Deep Learning Model (PyTorch)

import torch
import torch.nn as nn
from pybrush.unlearning import ApproximateUnlearning

class SimpleModel(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(input_size, output_size)
    
    def forward(self, x):
        return self.fc(x)

# Define model and training setup
model = SimpleModel(20, 2)
unlearning = ApproximateUnlearning(model)

# Forget specific data points (batch removal)
unlearning.unlearn(X_forget, y_forget)

Types of Machine Unlearning

PyBrush implements various machine unlearning techniques based on recent research papers:

1. Exact Unlearning

  • Definition: Removes specific data points completely by retraining the model without them.
  • Reference:
    • "Machine Unlearning: A Comprehensive Survey" - Wang et al., 2024 (arXiv:2405.07406)
    • "An Introduction to Machine Unlearning" - Mercuri et al. (arXiv:2209.00939)
  • Use Case: Required when a strict guarantee is needed that data is entirely removed.

2. Approximate Unlearning

  • Definition: Adjusts the model without full retraining, using gradient updates or regularization methods.
  • Reference:
  • Use Case: Useful for deep learning models where retraining is computationally expensive.

3. Amortized Unlearning

  • Definition: Designs models from the beginning to facilitate efficient unlearning.
  • Reference:
    • "Federated Unlearning: Removing Data Without Full Retraining" - Liu et al. (arXiv:2310.04821)
  • Use Case: Applied in federated learning and online learning environments.

4. Certified Unlearning

  • Definition: Uses mathematical proofs to verify complete removal of data influence.
  • Reference:
    • "Certified Machine Unlearning with Differential Privacy" - Papernot et al. (arXiv:2312.09876)
  • Use Case: Ensures provable removal, often needed for legal compliance.

API Reference

For detailed documentation on PyBrush functions, visit: 📌 PyBrush API Docs


Contribution Guide

PyBrush is an open-source community project. Contributions are welcome!

How to Contribute

  1. Fork the repository: GitHub Repo
  2. Create a new branch:
    git checkout -b feature-branch
    
  3. Make changes and commit:
    git commit -m "Added new unlearning method"
    
  4. Push changes and submit a PR:
    git push origin feature-branch
    

License

PyBrush is licensed under the MIT License.


References

  1. Wang et al., 2024. "Machine Unlearning: A Comprehensive Survey" - arXiv:2405.07406
  2. Mercuri et al., "An Introduction to Machine Unlearning" - arXiv:2209.00939
  3. Xu et al., "Machine Unlearning: Solutions and Challenges" - arXiv:2308.07061
  4. Liu et al., "Federated Unlearning: Removing Data Without Full Retraining" - arXiv:2310.04821
  5. Papernot et al., "Certified Machine Unlearning with Differential Privacy" - arXiv:2312.09876

🚀 Join the PyBrush Community! Stay updated with discussions, feature releases, and more! 📢 GitHub: https://github.com/pybrush/pybrush.git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybrush-0.1.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pybrush-0.1.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file pybrush-0.1.0.tar.gz.

File metadata

  • Download URL: pybrush-0.1.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for pybrush-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f4e2776bc2db42c91cfdda1e67df9c510ea386f25d296fb389cbc1c7c41309a1
MD5 df934a14df1e4ca23b19051feb3b2970
BLAKE2b-256 34d30b40f2f81155365d283d62f7827568ac862533b19d5f4950c86455d5fb2b

See more details on using hashes here.

File details

Details for the file pybrush-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pybrush-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for pybrush-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 66062b2e2abf7a68defc35e9a1fd6829efec0e1c0056cf3b59b585bc5c4e00f3
MD5 491a57a7b77ce8df9f1cca161190563e
BLAKE2b-256 e6a08c8a277bfa2d99bfdcef30f3587a32668d2ea04b71ca4732b106fd48d042

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page