Skip to main content

Tools for analyzing and comparing different sampling schemes in differential privacy

Project description

Random Allocation for Differential Privacy

This package provides tools for analyzing and comparing different random allocation schemes in the context of differential privacy.

Installation

You can install the package using pip:

pip install random-allocation

For PLD-based features, you'll also need:

pip install PLD_subsampling>=0.1.2

Usage

This package implements the privacy mechanisms and analyses described in the "Privacy Amplification by Random Allocation" paper. The following example demonstrates how to compare different schemes described in the paper:

Here's a simple example of how to use the package:

from random_allocation.comparisons.definitions import PrivacyParams, SchemeConfig, Direction
from random_allocation.comparisons.definitions import (
    ALLOCATION_DIRECT, ALLOCATION_DECOMPOSITION, POISSON_PLD, LOCAL
)
from random_allocation.comparisons.experiments import run_experiment, PlotType
import numpy as np

# Define privacy parameters
params = {
    'x_var': 'sigma',            # Variable to vary on x-axis
    'y_var': 'epsilon',          # Variable to compute on y-axis
    'sigma': np.linspace(0.1, 2, 20),  # Range of sigma values to test
    'delta': 1e-6,               # Target delta
    'num_steps': 10000,          # Number of steps per epoch
    'num_selected': 1,           # Number of times each element is used
    'num_epochs': 1              # Number of epochs
}

# Define scheme configuration
config = SchemeConfig(
    allocation_direct_alpha_orders=[int(i) for i in np.arange(2, 61, dtype=int)]
)

# Define methods to compare
methods = [LOCAL, POISSON_PLD, ALLOCATION_DIRECT, ALLOCATION_DECOMPOSITION]

# Define visualization configuration
visualization_config = {
    'log_x_axis': True, 
    'log_y_axis': True,
    'format_x': lambda x, _: f'{x:.2f}'
}

# Run the experiment
results = run_experiment(
    params_dict=params,
    config=config, 
    methods=methods,
    visualization_config=visualization_config,
    experiment_name='epsilon_vs_sigma',
    plot_type=PlotType.COMBINED,
    read_data=False,           # Whether to try reading existing data first
    save_data=True,            # Whether to save computed data
    save_plots=True,           # Whether to save plots to files
    show_plots=True,           # Whether to display plots interactively
    direction=Direction.BOTH   # Analyze both add and remove directions
)

Structure Overview

Privacy Parameters

The PrivacyParams class encapsulates all parameters needed for privacy calculations:

example_params = PrivacyParams(
    sigma               = 1.0,   # The Gaussian mechanism's noise scale
    num_steps           = 1_000, # The number of steps in each epoch of the scheme
    num_selected        = 1,     # The number times each element is used in an epoch in the random allocation scheme
    num_epochs          = 1,     # The number of epochs the scheme is ran
    epsilon             = None,  # The target epsilon.
    delta               = 1e-5,  # The target delta.
                                 # Exactly one value in {delta, epsilon} should be set to None
)

Scheme Configuration

The SchemeConfig class provides configuration options for various privacy schemes:

example_config = SchemeConfig(
    discretization = 1e-4,                             # The resolution used in the Gaussian's PLD numerical calculation.
                                                       # Default value = 1e-4.
    allocation_direct_alpha_orders = np.arange(2, 61), # The range of integer alpha orders used in the direct method for calculating the remove direction.
                                                       # Default value = None, and it must be defined.
    allocation_RDP_DCO_alpha_orders = None,            # The range of alpha orders used in the RDP DCO method.
                                                       # Default value = None, in which case default range of values is used.
    Poisson_alpha_orders = None,                       # The range of alpha orders used in the Poisson RDP method.
                                                       # Default value = None, in which case default range of values is used.
    print_alpha = False,                               # Indicates whether the alpha value used to set the optimal epsilon/delta value will be printed.
                                                       # Default value = False.
    delta_tolerance = 1e-15,                           # The desired resolution of the delta estimation, when using optimization based estimation methods.
                                                       # Default value = 1e-15.
    epsilon_tolerance = 1e-3,                          # The desired resolution of the epsilon estimation, when using optimization based estimation methods.
                                                       # Default value = 1e-3.
    epsilon_upper_bound = 100.0,                       # An upper bound on the search range of the epsilon estimation, when using binary search estimation methods.
                                                       # Default value = 100.0.
    MC_use_order_stats = True,                         # Indicates whether the order statistic technique should be used in the Monte Carlo simulation.
                                                       # Default value = True.
    MC_use_mean = False,                               # Indicates whether the Monte Carlo simulation method reports the mean or an upper bound.
                                                       # Default value = False, that is - report an upper bound.
    MC_conf_level = 0.99,                              # The probability with which the upper bound provided by the Monte Carlo simulation method is correct.
                                                       # Default value = 0.99.
    MC_sample_size = 500_000,                          # The sample size used in the Monte Carlo estimation.
                                                       # Default value = 500_000.
    shuffle_step = -1.0,                               # If negative, the analytic shuffle wrapper uses max(1, log(sqrt(num_steps / 2))).
                                                       # Default value = -1.
    shuffle_search_iterations = 60                     # Binary-search iterations used to invert the shuffle delta bound.
                                                       # Default value = 60.
)

Direction Enum

The Direction enum specifies the direction of privacy analysis:

from random_allocation.comparisons.definitions import Direction

# Available directions
Direction.ADD     # Add direction (adding a user to the dataset)
Direction.REMOVE  # Remove direction (removing a user from the dataset)
Direction.BOTH    # Both directions (consider both add and remove)

Available Methods

The package includes several methods for privacy analysis:

Local Methods

  • LOCAL: Basic local mechanism with no randomization in the sampling.

Poisson Methods

  • POISSON_PLD: Poisson subsampling with Privacy Loss Distribution (PLD) analysis.
  • POISSON_RDP: Poisson subsampling with Rényi Differential Privacy (RDP) analysis.

Shuffle Methods

  • SHUFFLE: Shuffle model of differential privacy.
  • SHUFFLE_LOWER_BOUND: Lower bound for the shuffle scheme.

Random Allocation Methods

  • ALLOCATION_ANALYTIC: Analytic method for the random allocation scheme.
  • ALLOCATION_DIRECT: Direct computation method using Rényi Differential Privacy.
  • ALLOCATION_RDP_DCO: RDP analysis following the approach of DCO25.
  • ALLOCATION_DECOMPOSITION: Decomposition method for random allocation scheme.
  • ALLOCATION_COMBINED: Combined approach leveraging multiple methods.
  • ALLOCATION_RECURSIVE: Recursive computation approach.
  • ALLOCATION_MONTE_CARLO: Monte Carlo simulation-based analysis (CGHKKLMSZ24).
  • ALLOCATION_LOWER_BOUND: Lower bound for the random allocation scheme (CGHKKLMSZ24).

Data Handling

The run_experiment function provides control over how data is read and saved:

run_experiment(
    # Required parameters
    params_dict=params,            # Dictionary of privacy parameters
    config=config,                 # SchemeConfig object
    methods=methods,               # List of methods to compare
    
    # Optional parameters
    visualization_config=viz_config,  # Visualization settings
    experiment_name='my_experiment',  # Name used for saved files
    plot_type=PlotType.COMPARISON,    # COMPARISON or COMBINED
    
    # Data handling options
    read_data=False,        # Try reading existing data before calculating
    save_data=True,         # Save computed data to CSV files
    save_plots=True,        # Save plots to files
    show_plots=True,        # Display plots interactively
    
    # Privacy analysis direction
    direction=Direction.BOTH  # ADD, REMOVE, or BOTH
)

Using named parameters for all function arguments is recommended to avoid confusion about parameter order.

When read_data=True, the system will first attempt to load previously saved data from the examples/data/ directory. If the data doesn't exist or read_data=False, new calculations will be performed.

Plotting

The package supports two types of plots:

  • PlotType.COMPARISON: For comparing different methods.
  • PlotType.COMBINED: For showing combined results.

Visualization configuration options include:

  • log_x_axis: Whether to use logarithmic scale for x-axis.
  • log_y_axis: Whether to use logarithmic scale for y-axis.
  • format_x: Function to format x-axis labels.
  • format_y: Function to format y-axis labels.
  • legend_position: Optional custom position for the legend (e.g., 'upper right', 'lower left').
  • num_y_ticks: Optional number of y-axis tick marks to display (if not specified, matplotlib's default is used).

The visualization module includes intelligent legend positioning that automatically places legends to avoid overlapping with data points. If not specified, the system analyzes data distribution and places the legend in the area with the fewest data points.

When using logarithmic y-axis scaling (log_y_axis=True), the system automatically adjusts tick formatting for privacy parameters:

  • For epsilon (ε) and delta (δ) parameters, small values (< 0.01) are displayed in scientific notation
  • For other values, regular decimal formatting is used

For multi-plot figures, use the plot_multiple_data function:

from random_allocation.comparisons.visualization import plot_multiple_data

# Create a multi-subplot figure with 3 plots in a row
fig = plot_multiple_data(
    data_list=[data1, data2, data3],  # List of data dictionaries
    titles=["Plot 1", "Plot 2", "Plot 3"],  # Optional titles for each subplot
    log_x_axis=True,
    log_y_axis=False,
    format_x=lambda x, _: f'{x:.2f}',
    plot_type='combined',  # 'combined' or 'comparison'
    figsize=(20, 6),  # Width, height in inches
    grid_layout=(1, 3),  # 1 row, 3 columns
    legend_position='upper right',  # Optional: specify legend position
    num_y_ticks=5  # Optional: control number of y-axis ticks
)

# Add a super title and adjust layout
plt.suptitle("My Multiple Plot Analysis", fontsize=20)
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()

Testing

The project includes a comprehensive test suite with 924 tests organized in a four-tier hierarchy to ensure mathematical correctness, robustness, and research reproducibility.

Quick Start

# Activate environment
conda activate random_allocation

# Run release-level tests (recommended for validation)
python tests/run_tests.py release

# Run basic tests (fast development feedback)
python tests/run_tests.py basic

# Run full tests (comprehensive development testing)
python tests/run_tests.py full

# Run all tests including paper experiments
python tests/run_tests.py paper

# Individual suite runners (recommended for focused testing)
python tests/run_basic_suite.py     # Basic tests only
python tests/run_full_suite.py      # Full tests only
python tests/run_release_suite.py   # Release tests only

Four-Tier Test Structure (2025 Modernized)

  • BASIC Tests (10 tests): Core functionality validation (~1-5 seconds)
  • FULL Tests (28 tests): Comprehensive testing including mathematical properties (~5-30 seconds)
  • RELEASE Tests (872 tests): Integration and comprehensive validation (~50+ seconds)
    • Type Annotations: 26 tests - Complete type coverage validation
    • Monotonicity: 370 tests - Mathematical property validation
    • Edge Cases: 476 tests - Mathematically precise boundary condition testing
  • PAPER Tests (Variable): Research reproducibility and experiment validation

Modernization Highlights

  • Mathematical Precision: Edge cases use only valid epsilon-delta relationships
  • Function Validation: Only existing functions tested (no missing function skips)
  • 28% test reduction: Eliminated 190 invalid test combinations
  • 60% skip reduction: Eliminated meaningless skipped tests

Individual Test Execution

# Run specific test directories
pytest tests/basic/ -v
pytest tests/full/ -v
pytest tests/release/ -v

# Run specific test files
pytest tests/release/test_release_03_edge_cases.py -v    # Edge cases
pytest tests/release/test_release_02_monotonicity.py -v # Monotonicity
pytest tests/full/test_full_03_utility_functions.py -v  # Utility functions

Total Test Runtime: ~1-2 minutes for release-level tests (excluding paper experiments)

Current Status: All core tests passing with comprehensive coverage of mathematical properties, edge cases, and type annotations.

For comprehensive documentation of the test suite, including detailed test descriptions and mathematical properties validated, see docs/test_documentation.md.

Documentation

Examples

The examples directory contains several reference implementations:

  • paper_experiments.py: Reproduces experiments from the paper "Privacy Amplification by Random Allocation". You can run this script to generate all the plots from the paper:
    python -m random_allocation.examples.paper_experiments
    

License

This project is licensed under the MIT License - see the LICENSE file for details.

External Code

This project includes code from external sources for comparison purposes:

  1. Shuffle Model Implementation:

    • Located in random_allocation/external_sources/shuffle_external.py
    • Source: ml-shuffling-amplification
    • Used to compare shuffle model performance with other privacy mechanisms
    • Original license: See accompanying LICENSE file from Apple Inc.
  2. Monte Carlo Estimator:

    • Located in random_allocation/external_sources/Monte_Carlo_external.py
    • Source: google-research/dpsgd_batch_sampler_accounting
    • Includes code from files: dpsgd_bounds, monte_carlo_estimator, and balls_and_bins
    • Used for comparison in our privacy analysis plots
    • Original license: Apache License, Version 2.0

These external implementations are used solely for comparative analysis and are not part of the core functionality of this package.

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

random_allocation-1.0.2.tar.gz (77.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

random_allocation-1.0.2-py3-none-any.whl (82.6 kB view details)

Uploaded Python 3

File details

Details for the file random_allocation-1.0.2.tar.gz.

File metadata

  • Download URL: random_allocation-1.0.2.tar.gz
  • Upload date:
  • Size: 77.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for random_allocation-1.0.2.tar.gz
Algorithm Hash digest
SHA256 e593963e1bec329a5ad0eeb258d430f972ea6e8441b597d5b6ae88e0ddb9c091
MD5 3fd84126a6e36e2b149fa3cf52e6058e
BLAKE2b-256 1bf726c94b978a481a5f07b8dc26e9f8aff2cdaf35365f179f9215d5f4f08ca0

See more details on using hashes here.

File details

Details for the file random_allocation-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for random_allocation-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b90c53e3f239ea73fc5c3a1d2e325fc983292a4658c60e4d00ae6eb674496b32
MD5 11b78989eed774c96531255e89d3df6e
BLAKE2b-256 880c5981cc5615d0e5f5ea881ecef5ed45310229f2f8a9333d22daed39aeb92e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page