Skip to main content

Numerical privacy accounting for random allocation and subsampling using PLDs.

Project description

PLD_accounting

Tight numerical privacy accounting for random allocation and subsampling using Privacy Loss Distributions (PLDs).

This library provides end-to-end DP accounting for federated learning and other privacy-preserving systems using the random allocation subsampling scheme. It supports both Gaussian mechanisms and custom PLD realizations, with adaptive resolution refinement for accuracy/performance tradeoffs.


Quick Start

Install from PyPI:

pip install PLD_accounting

Compute privacy guarantees with automatic resolution tuning:

from PLD_accounting import gaussian_allocation_epsilon_range

epsilon_upper, epsilon_lower = gaussian_allocation_epsilon_range(
    sigma=3.0,          # Gaussian noise scale
    num_steps=100,      # Total training steps
    num_selected=10,    # Clients selected per step
    delta=1e-6,         # Target delta
)
print(f"ε ∈ [{epsilon_lower:.4f}, {epsilon_upper:.4f}]")

What This Library Provides

  • Random allocation accounting: Tight bounds for selecting k out of n clients per round
  • Adaptive resolution: Automatic grid refinement to balance accuracy and runtime
  • Two input modes:
    • Gaussian: Specify sigma, num_steps, num_selected → get ε or δ
    • Realization: Provide explicit PLD → compose under random allocation
  • Direction-aware bounds: Upper/lower bounds for REMOVE, ADD, or BOTH directions
  • Subsampling amplification: Direct PLD-based amplification for PREAMBLE-style workflows (DOMINATES only)
  • Efficient convolution: FFT for linear grids, geometric for multiplicative grids

When to Use Each Path

Gaussian Path (Most Common)

Use when: Your mechanism adds Gaussian noise with known σ

Example:

from PLD_accounting import (
    gaussian_allocation_epsilon_extended,
    PrivacyParams,
    AllocationSchemeConfig,
)

params = PrivacyParams(sigma=3.0, num_steps=100, num_selected=10, delta=1e-6)
config = AllocationSchemeConfig(loss_discretization=0.02, tail_truncation=1e-8)

epsilon = gaussian_allocation_epsilon_extended(params, config)

Adaptive variant (recommended for exploratory analysis):

epsilon_upper, epsilon_lower = gaussian_allocation_epsilon_range(
    sigma=3.0, num_steps=100, num_selected=10, delta=1e-6
)

Realization Path (Advanced)

Use when: You have an explicit PLD realization or a non-Gaussian mechanism

Example:

import numpy as np
from PLD_accounting import general_allocation_PLD, PLDRealization, AllocationSchemeConfig

# Define your mechanism's privacy loss distribution on a linear grid
remove_realization = PLDRealization(
    x_min=0.0,
    x_gap=0.1,
    PMF_array=np.array([0.3, 0.25, 0.2, 0.15, 0.1]),
)
add_realization = PLDRealization(
    x_min=0.0,
    x_gap=0.1,
    PMF_array=np.array([0.3, 0.25, 0.2, 0.15, 0.1]),
)

pld = general_allocation_PLD(
    num_steps=10,
    num_selected=5,
    num_epochs=1,
    config=AllocationSchemeConfig(),
    remove_realization=remove_realization,
    add_realization=add_realization,
)

epsilon = pld.get_epsilon_for_delta(1e-6)

Requirements for realizations:

  • Linear grid structure (uniform spacing)
  • Valid PLD per Definition 3.1: E[exp(-L)] ≤ 1, no mass at L = -∞
  • Total probability mass = 1

Key Concepts

Random Allocation

Selecting k clients from n candidates provides privacy amplification compared to full-batch training. This library accounts for the composition of:

  • num_steps total allocation steps
  • num_selected clients chosen per step
  • num_epochs passes through the data

For both Gaussian and realization paths, composition semantics are:

  • inner per-round composition count: floor(num_steps / num_selected)
  • outer composition count: num_selected * num_epochs
  • therefore num_steps must satisfy num_steps >= num_selected

Directions

  • REMOVE: Privacy loss when removing a data record
  • ADD: Privacy loss when adding a data record
  • BOTH: Analyze both directions (most conservative)

Bound Types

  • DOMINATES: Upper bound (pessimistic, safe for privacy proofs)
  • IS_DOMINATED: Lower bound (optimistic, for tightness evaluation)

PLD Dual

For a PLD realization L, the dual D(L) is the PLD in the reversed privacy direction (L_{Q,P}). It reflects the support (l -> -l), reweights finite mass by exp(-l), and places the residual mass at +∞. In remove-direction internals, we often use -D(L), obtained explicitly by negating D(L).

Adaptive Resolution

The *_range() functions iteratively refine discretization to achieve target accuracy:

  • Start from Poisson-subsampled estimate
  • Refine grid spacing and truncation
  • Track best upper/lower bounds
  • Stop when gap meets target (or after 10 iterations)

Common Workflows

1. Simple ε Query

from PLD_accounting import gaussian_allocation_epsilon_range

eps_upper, eps_lower = gaussian_allocation_epsilon_range(
    sigma=5.0, num_steps=500, num_selected=8, delta=1e-5
)

2. PLD for Multiple Queries

from PLD_accounting import gaussian_allocation_PLD, PrivacyParams, AllocationSchemeConfig

pld = gaussian_allocation_PLD(
    params=PrivacyParams(sigma=3.0, num_steps=100, num_selected=10),
    config=AllocationSchemeConfig(),
)

# Query multiple (ε, δ) pairs efficiently
for delta in [1e-4, 1e-5, 1e-6]:
    eps = pld.get_epsilon_for_delta(delta)
    print(f"δ={delta:.0e} → ε={eps:.4f}")

3. Subsampling + Composition

from PLD_accounting import gaussian_allocation_PLD, subsample_PLD

# One training round
base_pld = gaussian_allocation_PLD(...)

# Apply subsampling amplification
subsampled = subsample_PLD(base_pld, sampling_probability=0.1)

# Compose across rounds
final_pld = subsampled.self_compose(num_rounds=20)
epsilon = final_pld.get_epsilon_for_delta(1e-6)

subsample_PLD() / subsample_PMF() are DOMINATES-only utilities; they do not accept a bound-type argument.

See usage_example.py for complete runnable examples including PREAMBLE-style workflows.


Configuration Parameters

PrivacyParams

  • sigma: Gaussian noise scale (higher = more privacy)
  • num_steps: Total allocation steps
  • num_selected: Clients per step (k in the paper)
  • num_epochs: Training epochs (default 1)
  • delta or epsilon: Query target

AllocationSchemeConfig

  • loss_discretization: Grid spacing (smaller = tighter, slower). Default: 1e-3
  • tail_truncation: Truncate probability mass below this. Default: 1e-10
  • max_grid_FFT: FFT grid size limit. Default: 2,000,000
  • max_grid_mult: Geometric grid size limit. Default: 0 (unlimited)
  • convolution_method: FFT, GEOM, BEST_OF_TWO, or COMBINED

Tradeoff: Smaller loss_discretization and tail_truncation → tighter bounds but higher memory/runtime.


Requirements

  • Python ≥ 3.10
  • numpy ≥ 1.23
  • scipy ≥ 1.10
  • numba ≥ 0.58
  • dp-accounting ≥ 0.4.3

All dependencies install automatically with the package.


Development Setup

From source:

git clone https://github.com/moshenfeld/PLD_accounting.git
cd PLD_accounting
pip install -e ".[dev]"

Run tests:

pytest -q

With coverage:

./tests/run_tests.sh --coverage

Build distribution:

python -m build

API Reference

Gaussian Path

  • gaussian_allocation_epsilon_range() - adaptive upper/lower ε bounds
  • gaussian_allocation_delta_range() - adaptive upper/lower δ bounds
  • gaussian_allocation_epsilon_extended() - single ε value with fixed config
  • gaussian_allocation_delta_extended() - single δ value with fixed config
  • gaussian_allocation_PLD() - build PLD object for repeated queries

Realization Path

  • general_allocation_PLD() - build PLD from explicit realizations
  • general_allocation_epsilon() - compute ε from realizations
  • general_allocation_delta() - compute δ from realizations
  • PLDRealization - linear-grid PLD realization type

Composition

  • subsample_PLD() - apply subsampling amplification to a PLD

Project Structure

PLD_accounting/
├── discrete_dist.py              # Distribution types (Linear, Geometric, PLDRealization)
├── random_allocation_api.py      # Public API surface
├── random_allocation_accounting.py  # Shared composition logic
├── random_allocation_gaussian.py    # Gaussian-specific implementation
├── adaptive_random_allocation.py    # Adaptive resolution refinement
├── geometric_convolution.py      # Multiplicative-grid convolution
├── FFT_convolution.py           # Linear-grid convolution
├── subsample_PLD.py             # Subsampling amplification
└── ...

tests/
├── unit/                        # Type checks, validations, edge cases
└── integration/                 # End-to-end workflows, comparisons

usage_example.py                 # Runnable examples

See IMPLEMENTATION_OVERVIEW.md for architectural details.


Citation

If you use this library in your research, please cite:

[Paper citation pending]

License

[License information]


Support

For questions or issues:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pld_accounting-0.2.0.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pld_accounting-0.2.0-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file pld_accounting-0.2.0.tar.gz.

File metadata

  • Download URL: pld_accounting-0.2.0.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pld_accounting-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e31bc74a5d561bc9ffa8aaea1cbae23e030cc4da6812265418574e829e9dc4cb
MD5 433f0f0630cd5089a6387a89952d76fa
BLAKE2b-256 7714e37067d73231b8a1db1e103bbe50a1395033013a41727d8f1834c5eb919e

See more details on using hashes here.

File details

Details for the file pld_accounting-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pld_accounting-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pld_accounting-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 70ea3a766efceca096e5ef8f7d93a6742ab15d743a6862783d2186877d5e113b
MD5 21d9dada2df231d5f1a5ff8016456cba
BLAKE2b-256 cbdea10f47e36210472a8668a0f9e88ed9f3155e6c70e51bdfb8676bf3f0977d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page