Skip to main content

Lethe is a comprehensive Python library for machine unlearning - the process of selectively removing the influence of specific training data from machine learning models.

Project description

Lethe: Comprehensive Machine Unlearning Library

Python Version PyPI Version License Documentation

Named after the Greek river of forgetfulness, Lethe provides state-of-the-art machine unlearning algorithms with comprehensive evaluation and verification capabilities.

Overview

Lethe is a comprehensive Python library for machine unlearning - the process of selectively removing the influence of specific training data from machine learning models. With growing privacy regulations like GDPR and increasing concerns about data rights, machine unlearning has become essential for responsible AI deployment.

Key Features

  • Multiple Unlearning Algorithms: Naive retraining, gradient ascent, SISA, influence functions, and more
  • Comprehensive Evaluation: Performance metrics, privacy verification, and utility assessment
  • Privacy Testing: Membership inference attacks and privacy loss estimation
  • Production Ready: Industry-standard APIs with proper error handling and logging
  • Extensive Documentation: Complete examples, tutorials, and API reference
  • Framework Agnostic: Works with scikit-learn, PyTorch, TensorFlow models
  • Benchmarking Suite: Compare different unlearning methods systematically

Quick Start

Installation

pip install lethe-ml

Or with uv:

uv add lethe-ml

Basic Usage

import lethe
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Create synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=3, random_state=42)

# Train model
model = RandomForestClassifier(n_estimators=50, random_state=42)
model.fit(X, y)

# Create data splits for unlearning
loader = lethe.DatasetLoader()
dataset = loader.load_from_arrays(X, y)
splitter = lethe.UnlearningDataSplitter()
data_split = splitter.create_unlearning_split(dataset, forget_ratio=0.1)

# Perform unlearning
result = lethe.unlearn(
    model=model,
    method='gradient_ascent',
    forget_data=data_split.forget,
    retain_data=data_split.retain
)

print(f"Unlearning completed in {result.execution_time:.4f}s")
print(f"Metrics: {result.metrics}")

Comprehensive Evaluation

# Evaluate unlearning quality
evaluator = lethe.UnlearningEvaluator(task_type="classification")
eval_result = evaluator.evaluate_unlearning(
    original_model=model,
    unlearned_model=result.unlearned_model,
    data_split=data_split
)

# Verify privacy and security
verifier = lethe.UnlearningVerifier()
verify_result = verifier.verify_unlearning(
    original_model=model,
    unlearned_model=result.unlearned_model,
    data_split=data_split
)

print(f"Unlearning Quality: {eval_result.unlearning_quality:.4f}")
print(f"Privacy Score: {verify_result.overall_score:.4f}")

Supported Algorithms

Algorithm Description Use Case
naive_retraining Retrain from scratch without forget data Gold standard baseline
gradient_ascent Gradient ascent on forget data Fast approximation
sisa Sharded, Isolated, Sliced, and Aggregated Scalable deployment
influence_function First-order approximation Theoretical foundation

Advanced Usage

Custom Unlearning Pipeline

from lethe import UnlearningAlgorithmFactory, ExperimentConfig

# Configure experiment
config = ExperimentConfig(
    experiment_name="privacy_evaluation",
    forget_ratio=0.15,
    unlearning_method="gradient_ascent",
    save_results=True
)

# Create custom algorithm
algorithm = UnlearningAlgorithmFactory.create_algorithm(
    "gradient_ascent",
    learning_rate=0.01,
    n_epochs=20
)

# Run unlearning
result = algorithm.unlearn(model, data_split.forget, data_split.retain)

Batch Processing

# Test multiple methods
methods = ['naive_retraining', 'gradient_ascent', 'sisa']
results = {}

for method in methods:
    result = lethe.unlearn(model, method, data_split.forget, data_split.retain)
    results[method] = result
    print(f"{method}: {result.execution_time:.4f}s")

Command Line Interface

# Run basic demo
python -m lethe

# Run advanced benchmarks
python -m lethe --benchmark --results-dir ./experiments

# Run specific demo
python -m lethe --demo advanced --log-level DEBUG

Documentation

Requirements

  • Python 3.8+
  • NumPy >= 1.21.0
  • scikit-learn >= 1.0.0
  • pandas >= 1.3.0
  • pydantic >= 2.0.0

Optional dependencies:

  • matplotlib >= 3.3.0 (for visualization)
  • seaborn >= 0.11.0 (for plotting)
  • jupyter (for examples)

Installation from Source

git clone https://github.com/yourusername/lethe.git
cd lethe
pip install -e .

With uv:

git clone https://github.com/yourusername/lethe.git
cd lethe
uv pip install -e .

Examples

Real-world Dataset

# Load real dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

# Create Lethe dataset
dataset = lethe.Dataset(
    X=data.data, 
    y=data.target,
    feature_names=data.feature_names.tolist(),
    target_names=data.target_names.tolist()
)

# Perform privacy-preserving unlearning
result = lethe.unlearn(
    model=LogisticRegression(),
    method='influence_function',
    forget_data=sensitive_data,
    retain_data=public_data
)

Model Comparison

from lethe.evaluation import EvaluationReport

# Compare multiple unlearning methods
comparison = lethe.compare_methods(
    model=model,
    methods=['naive_retraining', 'gradient_ascent', 'sisa'],
    data_split=data_split
)

# Generate comprehensive report
report = EvaluationReport.generate_text_report(comparison)
print(report)

Benchmarks

Performance on standard datasets:

Dataset Method Execution Time Utility Retention Privacy Score
Iris Gradient Ascent 0.045s 94.2% 0.87
Wine SISA 0.123s 91.8% 0.92
Breast Cancer Naive Retraining 0.234s 98.1% 0.95

Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/yourusername/lethe.git
cd lethe
uv sync --dev
uv run pytest

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=lethe --cov-report=html

# Run specific test
uv run pytest tests/test_algorithms.py -v

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lethe_ml-0.1.1.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lethe_ml-0.1.1-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file lethe_ml-0.1.1.tar.gz.

File metadata

  • Download URL: lethe_ml-0.1.1.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for lethe_ml-0.1.1.tar.gz
Algorithm Hash digest
SHA256 123a4462307372cbd725e06444af0b107e8b2a4ecd511c378876cc7d76a971b5
MD5 caefa816bf9ebd6a72ae9a073225c511
BLAKE2b-256 e656d7c6bac78b3cec51cc3076d1132abf1aae21cff36e00d4b195d0b524bf75

See more details on using hashes here.

File details

Details for the file lethe_ml-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: lethe_ml-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.14

File hashes

Hashes for lethe_ml-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8f2e35fbe139d71ee83383adddb8757f0725625699b38ad2eaa1c112f0cb08f9
MD5 65d4e009ee51b806468fc17bf21cc09e
BLAKE2b-256 52cca94776da911f2d9349b70b079b6ccd4cf318bda9c06550bfa4d374532c63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page