A Python library for Machine Unlearning
Project description
PyBrush
Introduction
PyBrush is an open-source Python library designed for Machine Unlearning, enabling efficient removal of specific data influences from trained machine learning models. Inspired by the principles of model interpretability, privacy preservation, and compliance with legal frameworks like GDPR's "Right to be Forgotten," PyBrush provides state-of-the-art tools for exact, approximate, amortized, and certified unlearning.
PyBrush is built with scalability and usability in mind, supporting both traditional ML models and deep learning frameworks like TensorFlow and PyTorch.
Key Features
- Multiple Unlearning Techniques: Support for Exact Unlearning, Approximate Unlearning, Amortized Unlearning, and Certified Unlearning.
- Integration with Existing ML Frameworks: Seamlessly works with scikit-learn, TensorFlow, and PyTorch.
- User-Friendly API: Simple, Keras-like API for rapid implementation.
- Privacy Compliance: Helps AI models adhere to data protection regulations.
- Open-Source and Extensible: Community-driven, with a modular architecture for extending functionality.
Installation
You can install PyBrush via pip:
pip install pybrush
Alternatively, install directly from the GitHub repository:
git clone https://github.com/pybrush/pybrush.git
cd pybrush
pip install .
Usage
Import PyBrush
from pybrush import core
Example: Exact Unlearning for a Logistic Regression Model
from pybrush.unlearning import ExactUnlearning
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20)
# Train initial model
model = LogisticRegression()
model.fit(X, y)
# Forget a specific data point
unlearning = ExactUnlearning(model)
X_forget, y_forget = X[0:10], y[0:10]
new_model = unlearning.unlearn(X_forget, y_forget)
Example: Approximate Unlearning in a Deep Learning Model (PyTorch)
import torch
import torch.nn as nn
from pybrush.unlearning import ApproximateUnlearning
class SimpleModel(nn.Module):
def __init__(self, input_size, output_size):
super(SimpleModel, self).__init__()
self.fc = nn.Linear(input_size, output_size)
def forward(self, x):
return self.fc(x)
# Define model and training setup
model = SimpleModel(20, 2)
unlearning = ApproximateUnlearning(model)
# Forget specific data points (batch removal)
unlearning.unlearn(X_forget, y_forget)
Types of Machine Unlearning
PyBrush implements various machine unlearning techniques based on recent research papers:
1. Exact Unlearning
- Definition: Removes specific data points completely by retraining the model without them.
- Reference:
- "Machine Unlearning: A Comprehensive Survey" - Wang et al., 2024 (arXiv:2405.07406)
- "An Introduction to Machine Unlearning" - Mercuri et al. (arXiv:2209.00939)
- Use Case: Required when a strict guarantee is needed that data is entirely removed.
2. Approximate Unlearning
- Definition: Adjusts the model without full retraining, using gradient updates or regularization methods.
- Reference:
- "Machine Unlearning: Solutions and Challenges" - Xu et al. (arXiv:2308.07061)
- Use Case: Useful for deep learning models where retraining is computationally expensive.
3. Amortized Unlearning
- Definition: Designs models from the beginning to facilitate efficient unlearning.
- Reference:
- "Federated Unlearning: Removing Data Without Full Retraining" - Liu et al. (arXiv:2310.04821)
- Use Case: Applied in federated learning and online learning environments.
4. Certified Unlearning
- Definition: Uses mathematical proofs to verify complete removal of data influence.
- Reference:
- "Certified Machine Unlearning with Differential Privacy" - Papernot et al. (arXiv:2312.09876)
- Use Case: Ensures provable removal, often needed for legal compliance.
API Reference
For detailed documentation on PyBrush functions, visit: 📌 PyBrush API Docs
Contribution Guide
PyBrush is an open-source community project. Contributions are welcome!
How to Contribute
- Fork the repository: GitHub Repo
- Create a new branch:
git checkout -b feature-branch
- Make changes and commit:
git commit -m "Added new unlearning method"
- Push changes and submit a PR:
git push origin feature-branch
License
PyBrush is licensed under the MIT License.
References
- Wang et al., 2024. "Machine Unlearning: A Comprehensive Survey" - arXiv:2405.07406
- Mercuri et al., "An Introduction to Machine Unlearning" - arXiv:2209.00939
- Xu et al., "Machine Unlearning: Solutions and Challenges" - arXiv:2308.07061
- Liu et al., "Federated Unlearning: Removing Data Without Full Retraining" - arXiv:2310.04821
- Papernot et al., "Certified Machine Unlearning with Differential Privacy" - arXiv:2312.09876
🚀 Join the PyBrush Community! Stay updated with discussions, feature releases, and more! 📢 GitHub: https://github.com/pybrush/pybrush.git
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pybrush-0.1.0.tar.gz.
File metadata
- Download URL: pybrush-0.1.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4e2776bc2db42c91cfdda1e67df9c510ea386f25d296fb389cbc1c7c41309a1
|
|
| MD5 |
df934a14df1e4ca23b19051feb3b2970
|
|
| BLAKE2b-256 |
34d30b40f2f81155365d283d62f7827568ac862533b19d5f4950c86455d5fb2b
|
File details
Details for the file pybrush-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pybrush-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66062b2e2abf7a68defc35e9a1fd6829efec0e1c0056cf3b59b585bc5c4e00f3
|
|
| MD5 |
491a57a7b77ce8df9f1cca161190563e
|
|
| BLAKE2b-256 |
e6a08c8a277bfa2d99bfdcef30f3587a32668d2ea04b71ca4732b106fd48d042
|