Skip to main content

Implementation of correlated noise mechanism with streaming and multi epoch settings to enable differentially private deep learning

Project description

Correlated Noise Mechanism

Build Status PyPI version Python 3.7+ License: MIT Documentation Status

Overview

Correlated Noise Mechanism is an open source library for enabling differentially private training of deep learning models. This library provides streaming and multi-epoch setting support with the Opacus privacy engine.

Key Features

  • 🚀 High Performance: Enables comparable performance to benchmarks while preserving privacy
  • 🔧 Easy Integration: Needs minimal modification to the PyTorch training codes
  • 📊 Multiple Algorithms: Incorporates streaming, multi-epoch correlated noise mechanism, and DP-SGD from Opacus with a better accountant
  • 🔬 Research-Grade: Can be used to benchmark differential privacy algorithms
  • 🐍 PyTorch/NumPy Compatible: Compatible with PyTorch

Table of Contents

Installation

Prerequisites

  • Python 3.7 or higher
  • numpy >= 1.26.4
  • torch >= 2.3.0
  • torchvision >= 0.18.0
  • opacus >= 1.5.2

Install from PyPI

pip install correlated-noise-mechanism

Install from Source

git clone https://github.com/yourusername/correlated_noise_mechanism.git
cd correlated_noise_mechanism
pip install -e .

Development Installation

git clone https://github.com/yourusername/correlated_noise_mechanism.git
cd correlated_noise_mechanism
pip install -e ".[dev]"

Quick Start

Here's a minimal example to get you started:

from correlated_noise_mechanism.privacy_engine import CNMEngine

privacy_engine = CNMEngine()
model, optimizer, train_loader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=train_loader,
    epochs=EPOCHS,
    target_epsilon=epsilon,
    target_delta=delta,
    max_grad_norm=grad_norm,
    mode = "BLT",
    participation = "streaming",
    error_type="rmse",
    d = 4,
    b = 5,
    k = 8,
)

Documentation

Examples

Explore our example gallery:

Performance

Benchmarks

FashionMNIST with the CNN from examples/model_zoo.py, batch size 1024, max_grad_norm = 1. SGD baselines use lr=1 and 50 epochs (matching the historical setup); CNM-AdamBC uses Adam with lr=1e-2 and 50 epochs.

Method $\epsilon$ $\delta$ Accuracy
BLT (Multi Epoch) 8 $N^{-1.1}$ 82.4758
BLT (Streaming) 8 $N^{-1.1}$ 76.3740
DP-SGD 8 $N^{-1.1}$ 82.0090
CNM-AdamBC (Multi Epoch) 8 $N^{-1.1}$ 83.7265
CNM-AdamBC (Streaming) 8 $N^{-1.1}$ 79.2377
Non Private $\infty$ $0$ 87.8557
Non Private (Adam) $\infty$ $0$ 89.3236

The two CNM-AdamBC rows extend the table with the time-varying-bias-corrected Adam optimizer (mode="BLT-Adam" / mode="Multi-Epoch-BLT-Adam") introduced in v0.4. Reproduce any row by editing examples/basic_usage.py to set the relevant mode / participation / inner optimizer + lr:

python examples/basic_usage.py

mode="BLT" + participation="streaming" reproduces BLT (Streaming); mode="Multi-Epoch-BLT" + participation="minSep" reproduces BLT (Multi Epoch); the same with mode="BLT-Adam" / mode="Multi-Epoch-BLT-Adam" (and an Adam inner optimizer at lr=1e-2) reproduces the two CNM-AdamBC rows.

Core Classes

CNMEngine

class CNMEngine:
    """
    A privacy engine that extends Opacus's PrivacyEngine to provide correlated noise mechanism
    for differentially private training of deep learning models. This engine supports both
    streaming and multi-epoch settings with various noise correlation patterns.

    Parameters
    ----------
    module : torch.nn.Module
        PyTorch module to be used for training
    optimizer : torch.optim.Optimizer
        Optimizer to be used for training
    data_loader : torch.utils.data.DataLoader
        DataLoader to be used for training
    target_epsilon : float
        Target epsilon to be achieved, a metric of privacy loss at differential changes in data
    target_delta : float
        Target delta to be achieved. Probability of information being leaked
    epochs : int
        Number of training epochs
    max_grad_norm : Union[float, List[float]]
        The maximum norm of the per-sample gradients. Any gradient with norm higher than this will be clipped
    mode : str
        Mode of operation: 'DP-SGD-BASE', 'DP-SGD-AMPLIFIED', 'BLT',
        'Multi-Epoch-BLT', 'Single Parameter', 'BLT-Adam', or
        'Multi-Epoch-BLT-Adam'. The two ``-Adam`` variants use
        ``CNMAdamOptimizer`` (Adam with time-varying bias correction).
    participation : str
        Participation pattern: 'streaming', 'cyclic', or 'minSep'
    error_type : str
        Type of error to minimize: 'rmse' or 'max'
    d : int
        Number of parameters/buffers for BLT mode
    b : int
        Minimum separation parameter for BLT mode
    k : int
        Number of columns to consider in sensitivity for BLT mode
    gamma : Optional[float]
        A scalar for Single Parameter mode
    batch_first : bool, default=True
        Flag to indicate if the input tensor has the first dimension representing the batch
    loss_reduction : str, default='mean'
        Indicates if the loss reduction is a sum or mean operation ('sum' or 'mean')
    poisson_sampling : bool, default=True
        Whether to use standard sampling required for DP guarantees
    clipping : str, default='flat'
        Per sample gradient clipping mechanism ('flat', 'per_layer', or 'adaptive')
    noise_generator : Optional[torch.Generator]
        Generator for noise
    grad_sample_mode : str, default='hooks'
        Mode for computing per sample gradients
    """

BLTDifferentiableLossOptimizer

class BLTDifferentiableLossOptimizer:
    """
    An optimizer that implements the Banded Linear Transformation (BLT) mechanism with differentiable loss
    for optimizing noise correlation parameters. This optimizer is used internally by CNMEngine to
    find optimal parameters for the BLT mechanism that minimize error while maintaining privacy guarantees.

    Parameters
    ----------
    n : int
        Number of rounds (size of the matrix)
    d : int
        Number of buffers/parameters
    b : int, default=5
        Minimum separation parameter
    k : int, default=10
        Maximum participations
    participation_pattern : str, default='minSep'
        Pattern of participation: 'minSep', 'cyclic', or 'streaming'
    error_type : str, default='rmse'
        Type of error to minimize: 'rmse' or 'max'
    lambda_penalty : float, default=1e-7
        Penalty strength for log-barrier optimization
    device : str, default='cuda' if available else 'cpu'
        Computation device
    """

BLTOptimizer

class BLTOptimizer:
    """
    An optimizer that implements the Banded Linear Transformation (BLT) mechanism for differentially
    private training. This optimizer provides an alternative implementation of the BLT mechanism
    that focuses on optimizing the noise correlation parameters using closed-form expressions
    and gradient-based optimization.

    Parameters
    ----------
    n : int
        Size of the matrix (number of steps)
    d : int
        Number of parameters
    b : int, default=5
        Minimum separation parameter
    k : int, default=10
        Number of columns to consider in sensitivity
    error_type : str, default='rmse'
        Type of error to minimize: 'rmse' or 'max'
    participation : str, default='minSep'
        Participation pattern: 'minSep', 'cyclic', or 'single'
    device : str, default='cuda' if available else 'cpu'
        Computation device
    """

Key Methods

make_private()

def make_private(
    self,
    *,
    module: nn.Module,
    optimizer: optim.Optimizer,
    criterion=nn.CrossEntropyLoss(),
    data_loader: DataLoader,
    noise_multiplier: float,
    max_grad_norm: Union[float, List[float]],
    mode: str,
    a: Optional[Union[float, torch.Tensor]] = None,
    lamda: Optional[Union[float, torch.Tensor]] = None,
    gamma: Optional[float] = None,
    batch_first: bool = True,
    loss_reduction: str = "mean",
    poisson_sampling: bool = True,
    clipping: str = "flat",
    noise_generator=None,
    grad_sample_mode: str = "hooks",
    **kwargs,
) -> Tuple[GradSampleModule, CNMOptimizer, DataLoader]:
    """
    Add privacy-related responsibilities to the main PyTorch training objects:
    model, optimizer, and the data loader.

    All of the returned objects act just like their non-private counterparts
    passed as arguments, but with added DP tasks.

    - Model is wrapped to also compute per sample gradients.
    - Optimizer is now responsible for gradient clipping and adding noise to the gradients.
    - DataLoader is updated to perform Poisson sampling.

    Notes:
        Using any other models, optimizers, or data sources during training
        will invalidate stated privacy guarantees.

    Parameters
    ----------
    module : torch.nn.Module
        PyTorch module to be used for training
    optimizer : torch.optim.Optimizer
        Optimizer to be used for training
    criterion : torch.nn.Module, default=nn.CrossEntropyLoss()
        Loss function to be used for training
    data_loader : torch.utils.data.DataLoader
        DataLoader to be used for training
    noise_multiplier : float
        The ratio of the standard deviation of the Gaussian noise to the L2-sensitivity
        of the function to which the noise is added (How much noise to add)
    max_grad_norm : Union[float, List[float]]
        The maximum norm of the per-sample gradients. Any gradient with norm higher than
        this will be clipped to this value.
    mode : str
        Mode of operation: 'DP-SGD-BASE', 'DP-SGD-AMPLIFIED', 'BLT',
        'Multi-Epoch-BLT', 'Single Parameter', 'BLT-Adam', or
        'Multi-Epoch-BLT-Adam'. The two ``-Adam`` variants use
        ``CNMAdamOptimizer`` (Adam with time-varying bias correction).
    a : Optional[Union[float, torch.Tensor]], default=None
        Parameters for BLT mode
    lamda : Optional[Union[float, torch.Tensor]], default=None
        Parameters for BLT mode
    gamma : Optional[float], default=None
        A scalar for Single Parameter mode
    batch_first : bool, default=True
        Flag to indicate if the input tensor has the first dimension representing the batch
    loss_reduction : str, default='mean'
        Indicates if the loss reduction is a sum or mean operation ('sum' or 'mean')
    poisson_sampling : bool, default=True
        Whether to use standard sampling required for DP guarantees
    clipping : str, default='flat'
        Per sample gradient clipping mechanism ('flat', 'per_layer', or 'adaptive')
    noise_generator : Optional[torch.Generator], default=None
        Generator for noise
    grad_sample_mode : str, default='hooks'
        Mode for computing per sample gradients

    Returns
    -------
    Tuple[GradSampleModule, CNMOptimizer, DataLoader]
        Tuple of (model, optimizer, data_loader) with added privacy guarantees
    """

make_private_with_epsilon()

def make_private_with_epsilon(
    self,
    *,
    module: nn.Module,
    optimizer: optim.Optimizer,
    criterion=nn.CrossEntropyLoss(),
    data_loader: DataLoader,
    target_epsilon: float,
    target_delta: float,
    epochs: int,
    max_grad_norm: Union[float, List[float]],
    mode: str,
    a: Optional[Union[float, torch.Tensor]] = None,
    lamda: Optional[Union[float, torch.Tensor]] = None,
    gamma: Optional[float] = None,
    batch_first: bool = True,
    loss_reduction: str = "mean",
    poisson_sampling: bool = True,
    clipping: str = "flat",
    noise_generator=None,
    grad_sample_mode: str = "hooks",
    **kwargs,
) -> Tuple[GradSampleModule, CNMOptimizer, DataLoader]:
    """
    Version of make_private that calculates privacy parameters based on a given privacy budget.
    This is the recommended method for most use cases as it automatically handles noise
    multiplier calculations.

    Parameters
    ----------
    module : torch.nn.Module
        PyTorch module to be used for training
    optimizer : torch.optim.Optimizer
        Optimizer to be used for training
    criterion : torch.nn.Module, default=nn.CrossEntropyLoss()
        Loss function to be used for training
    data_loader : torch.utils.data.DataLoader
        DataLoader to be used for training
    target_epsilon : float
        Target epsilon to be achieved, a metric of privacy loss at differential changes in data
    target_delta : float
        Target delta to be achieved. Probability of information being leaked
    epochs : int
        Number of training epochs you intend to perform; noise_multiplier relies on this
        to calculate an appropriate sigma to ensure privacy budget of (target_epsilon,
        target_delta) at the end of epochs
    max_grad_norm : Union[float, List[float]]
        The maximum norm of the per-sample gradients. Any gradient with norm higher than
        this will be clipped to this value
    mode : str
        Mode of operation: 'DP-SGD-BASE', 'DP-SGD-AMPLIFIED', 'BLT',
        'Multi-Epoch-BLT', 'Single Parameter', 'BLT-Adam', or
        'Multi-Epoch-BLT-Adam'. The two ``-Adam`` variants use
        ``CNMAdamOptimizer`` (Adam with time-varying bias correction).
    a : Optional[Union[float, torch.Tensor]], default=None
        Parameters for BLT mode
    lamda : Optional[Union[float, torch.Tensor]], default=None
        Parameters for BLT mode
    gamma : Optional[float], default=None
        A scalar for Single Parameter mode
    batch_first : bool, default=True
        Flag to indicate if the input tensor has the first dimension representing the batch
    loss_reduction : str, default='mean'
        Indicates if the loss reduction is a sum or mean operation ('sum' or 'mean')
    poisson_sampling : bool, default=True
        Whether to use standard sampling required for DP guarantees
    clipping : str, default='flat'
        Per sample gradient clipping mechanism ('flat', 'per_layer', or 'adaptive')
    noise_generator : Optional[torch.Generator], default=None
        Generator for noise
    grad_sample_mode : str, default='hooks'
        Mode for computing per sample gradients

    Returns
    -------
    Tuple[GradSampleModule, CNMOptimizer, DataLoader]
        Tuple of (model, optimizer, data_loader) with added privacy guarantees
    """

For complete API documentation, see docs/api.md [not yet up].

Contributing

Contributions are welcome! Please send an email to the email id listed below to collaborate.

Citation

If you use this library in your research, please cite:

@software{correlated_noise_mechanism,
  author = {Ashish Srivastava},
  title = {Correlated Noise Mechanism: Extending opacus to enable BLT Mechanisms},
  year = {2025},
  url = {https://github.com/grim-hitman0XX/correlated_noise_mechanism},
  version = {v0.2.0}
}

See the open issues for a full list of proposed features and known issues.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • This project was developed as part of a course project at IIT Madras under the course DA7450 Topics in Privacy
  • A large part of this work is inspired by the tutorials and collection of Prof Krishna Pillutla
  • I'd like to thank my advisor Prof Krishna Pillutla for enabling this project and helping me throughout

Related Projects

  • Google's Differential Privacy - Google's open-source differential privacy library that provides tools for building differentially private applications.
  • Opacus - PyTorch library for training deep learning models with differential privacy, which this library extends to support correlated noise mechanisms.

Contact


Made with ❤️ by Ashish Srivastava

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

correlated_noise_mechanism-0.4.0.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

correlated_noise_mechanism-0.4.0-py3-none-any.whl (29.9 kB view details)

Uploaded Python 3

File details

Details for the file correlated_noise_mechanism-0.4.0.tar.gz.

File metadata

File hashes

Hashes for correlated_noise_mechanism-0.4.0.tar.gz
Algorithm Hash digest
SHA256 5f8985971a616abd9f5e62be892aa1f47ad5d82e362eda20f2c35daee3007e7a
MD5 5c116383e16071dcae7348dd923def8a
BLAKE2b-256 33c4180ca0792b1e7af60725ddf2b3e96f9dbaccb1930a0d44b51e009bf4f4a0

See more details on using hashes here.

File details

Details for the file correlated_noise_mechanism-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for correlated_noise_mechanism-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cc0ba9b67c97373c5122563c7dd858942f8a0925f82e2faab0fcad76a4f9877a
MD5 c8c6394388d11bade24668d7f8a84ff6
BLAKE2b-256 ec7fbfe089da20e63aaa3de6a535933ae159c91d782a80ea634f37b1781108fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page