Lethe is a comprehensive Python library for machine unlearning - the process of selectively removing the influence of specific training data from machine learning models.
Project description
Lethe: Comprehensive Machine Unlearning Library
Named after the Greek river of forgetfulness, Lethe provides state-of-the-art machine unlearning algorithms with comprehensive evaluation and verification capabilities.
Overview
Lethe is a comprehensive Python library for machine unlearning - the process of selectively removing the influence of specific training data from machine learning models. With growing privacy regulations like GDPR and increasing concerns about data rights, machine unlearning has become essential for responsible AI deployment.
Key Features
- Multiple Unlearning Algorithms: Naive retraining, gradient ascent, SISA, influence functions, and more
- Comprehensive Evaluation: Performance metrics, privacy verification, and utility assessment
- Privacy Testing: Membership inference attacks and privacy loss estimation
- Production Ready: Industry-standard APIs with proper error handling and logging
- Extensive Documentation: Complete examples, tutorials, and API reference
- Framework Agnostic: Works with scikit-learn, PyTorch, TensorFlow models
- Benchmarking Suite: Compare different unlearning methods systematically
Quick Start
Installation
pip install lethe-ml
Or with uv:
uv add lethe-ml
Basic Usage
import lethe
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
# Create synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=3, random_state=42)
# Train model
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(X, y)
# Create data splits for unlearning
loader = lethe.DatasetLoader()
dataset = loader.load_from_arrays(X, y)
splitter = lethe.UnlearningDataSplitter()
data_split = splitter.create_unlearning_split(dataset, forget_ratio=0.1)
# Perform unlearning
result = lethe.unlearn(
model=model,
method='gradient_ascent',
forget_data=data_split.forget,
retain_data=data_split.retain
)
print(f"Unlearning completed in {result.execution_time:.4f}s")
print(f"Metrics: {result.metrics}")
Comprehensive Evaluation
# Evaluate unlearning quality
evaluator = lethe.UnlearningEvaluator(task_type="classification")
eval_result = evaluator.evaluate_unlearning(
original_model=model,
unlearned_model=result.unlearned_model,
data_split=data_split
)
# Verify privacy and security
verifier = lethe.UnlearningVerifier()
verify_result = verifier.verify_unlearning(
original_model=model,
unlearned_model=result.unlearned_model,
data_split=data_split
)
print(f"Unlearning Quality: {eval_result.unlearning_quality:.4f}")
print(f"Privacy Score: {verify_result.overall_score:.4f}")
Supported Algorithms
| Algorithm | Description | Use Case |
|---|---|---|
naive_retraining |
Retrain from scratch without forget data | Gold standard baseline |
gradient_ascent |
Gradient ascent on forget data | Fast approximation |
sisa |
Sharded, Isolated, Sliced, and Aggregated | Scalable deployment |
influence_function |
First-order approximation | Theoretical foundation |
Advanced Usage
Custom Unlearning Pipeline
from lethe import UnlearningAlgorithmFactory, ExperimentConfig
# Configure experiment
config = ExperimentConfig(
experiment_name="privacy_evaluation",
forget_ratio=0.15,
unlearning_method="gradient_ascent",
save_results=True
)
# Create custom algorithm
algorithm = UnlearningAlgorithmFactory.create_algorithm(
"gradient_ascent",
learning_rate=0.01,
n_epochs=20
)
# Run unlearning
result = algorithm.unlearn(model, data_split.forget, data_split.retain)
Batch Processing
# Test multiple methods
methods = ['naive_retraining', 'gradient_ascent', 'sisa']
results = {}
for method in methods:
result = lethe.unlearn(model, method, data_split.forget, data_split.retain)
results[method] = result
print(f"{method}: {result.execution_time:.4f}s")
Command Line Interface
# Run basic demo
python -m lethe
# Run advanced benchmarks
python -m lethe --benchmark --results-dir ./experiments
# Run specific demo
python -m lethe --demo advanced --log-level DEBUG
Documentation
- Quick Start Guide: Get up and running in minutes
- API Reference: Complete function and class documentation
- Algorithm Guide: Detailed explanation of unlearning methods
- Examples: Jupyter notebooks and scripts
- Contributing: How to contribute to the project
Requirements
- Python 3.8+
- NumPy >= 1.21.0
- scikit-learn >= 1.0.0
- pandas >= 1.3.0
- pydantic >= 2.0.0
Optional dependencies:
- matplotlib >= 3.3.0 (for visualization)
- seaborn >= 0.11.0 (for plotting)
- jupyter (for examples)
Installation from Source
git clone https://github.com/yourusername/lethe.git
cd lethe
pip install -e .
With uv:
git clone https://github.com/yourusername/lethe.git
cd lethe
uv pip install -e .
Examples
Real-world Dataset
# Load real dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
# Create Lethe dataset
dataset = lethe.Dataset(
X=data.data,
y=data.target,
feature_names=data.feature_names.tolist(),
target_names=data.target_names.tolist()
)
# Perform privacy-preserving unlearning
result = lethe.unlearn(
model=LogisticRegression(),
method='influence_function',
forget_data=sensitive_data,
retain_data=public_data
)
Model Comparison
from lethe.evaluation import EvaluationReport
# Compare multiple unlearning methods
comparison = lethe.compare_methods(
model=model,
methods=['naive_retraining', 'gradient_ascent', 'sisa'],
data_split=data_split
)
# Generate comprehensive report
report = EvaluationReport.generate_text_report(comparison)
print(report)
Benchmarks
Performance on standard datasets:
| Dataset | Method | Execution Time | Utility Retention | Privacy Score |
|---|---|---|---|---|
| Iris | Gradient Ascent | 0.045s | 94.2% | 0.87 |
| Wine | SISA | 0.123s | 91.8% | 0.92 |
| Breast Cancer | Naive Retraining | 0.234s | 98.1% | 0.95 |
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
git clone https://github.com/yourusername/lethe.git
cd lethe
uv sync --dev
uv run pytest
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=lethe --cov-report=html
# Run specific test
uv run pytest tests/test_algorithms.py -v
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lethe_ml-0.1.1.tar.gz.
File metadata
- Download URL: lethe_ml-0.1.1.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
123a4462307372cbd725e06444af0b107e8b2a4ecd511c378876cc7d76a971b5
|
|
| MD5 |
caefa816bf9ebd6a72ae9a073225c511
|
|
| BLAKE2b-256 |
e656d7c6bac78b3cec51cc3076d1132abf1aae21cff36e00d4b195d0b524bf75
|
File details
Details for the file lethe_ml-0.1.1-py3-none-any.whl.
File metadata
- Download URL: lethe_ml-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f2e35fbe139d71ee83383adddb8757f0725625699b38ad2eaa1c112f0cb08f9
|
|
| MD5 |
65d4e009ee51b806468fc17bf21cc09e
|
|
| BLAKE2b-256 |
52cca94776da911f2d9349b70b079b6ccd4cf318bda9c06550bfa4d374532c63
|