Numerical privacy accounting for random allocation and subsampling using PLDs.
Project description
PLD_accounting
Tight numerical privacy accounting for random allocation and subsampling using Privacy Loss Distributions (PLDs).
This library provides end-to-end DP accounting for federated learning and other privacy-preserving systems using the random allocation subsampling scheme. It supports both Gaussian mechanisms and custom PLD realizations, with adaptive resolution refinement for accuracy/performance tradeoffs.
Quick Start
Install from PyPI:
pip install PLD_accounting
Compute privacy guarantees with automatic resolution tuning:
from PLD_accounting import gaussian_allocation_epsilon_range
epsilon_upper, epsilon_lower = gaussian_allocation_epsilon_range(
sigma=3.0, # Gaussian noise scale
num_steps=100, # Total training steps
num_selected=10, # Clients selected per step
delta=1e-6, # Target delta
)
print(f"ε ∈ [{epsilon_lower:.4f}, {epsilon_upper:.4f}]")
What This Library Provides
- Random allocation accounting: Tight bounds for selecting
kout ofnclients per round - Adaptive resolution: Automatic grid refinement to balance accuracy and runtime
- Two input modes:
- Gaussian: Specify
sigma,num_steps,num_selected→ get ε or δ - Realization: Provide explicit PLD → compose under random allocation
- Gaussian: Specify
- Direction-aware bounds: Upper/lower bounds for REMOVE, ADD, or BOTH directions
- Subsampling amplification: Direct PLD-based amplification for PREAMBLE-style workflows (DOMINATES only)
- Efficient convolution: FFT for linear grids, geometric for multiplicative grids
When to Use Each Path
Gaussian Path (Most Common)
Use when: Your mechanism adds Gaussian noise with known σ
Example:
from PLD_accounting import (
gaussian_allocation_epsilon_extended,
PrivacyParams,
AllocationSchemeConfig,
)
params = PrivacyParams(sigma=3.0, num_steps=100, num_selected=10, delta=1e-6)
config = AllocationSchemeConfig(loss_discretization=0.02, tail_truncation=1e-8)
epsilon = gaussian_allocation_epsilon_extended(params, config)
Adaptive variant (recommended for exploratory analysis):
epsilon_upper, epsilon_lower = gaussian_allocation_epsilon_range(
sigma=3.0, num_steps=100, num_selected=10, delta=1e-6
)
Realization Path (Advanced)
Use when: You have an explicit PLD realization or a non-Gaussian mechanism
Example:
import numpy as np
from PLD_accounting import general_allocation_PLD, PLDRealization, AllocationSchemeConfig
# Define your mechanism's privacy loss distribution on a linear grid
remove_realization = PLDRealization(
x_min=0.0,
x_gap=0.1,
PMF_array=np.array([0.3, 0.25, 0.2, 0.15, 0.1]),
)
add_realization = PLDRealization(
x_min=0.0,
x_gap=0.1,
PMF_array=np.array([0.3, 0.25, 0.2, 0.15, 0.1]),
)
pld = general_allocation_PLD(
num_steps=10,
num_selected=5,
num_epochs=1,
config=AllocationSchemeConfig(),
remove_realization=remove_realization,
add_realization=add_realization,
)
epsilon = pld.get_epsilon_for_delta(1e-6)
Requirements for realizations:
- Linear grid structure (uniform spacing)
- Valid PLD per Definition 3.1:
E[exp(-L)] ≤ 1, no mass atL = -∞ - Total probability mass = 1
Key Concepts
Random Allocation
Selecting k clients from n candidates provides privacy amplification compared to full-batch training. This library accounts for the composition of:
num_stepstotal allocation stepsnum_selectedclients chosen per stepnum_epochspasses through the data
For both Gaussian and realization paths, composition semantics are:
- inner per-round composition count:
floor(num_steps / num_selected) - outer composition count:
num_selected * num_epochs - therefore
num_stepsmust satisfynum_steps >= num_selected
Directions
- REMOVE: Privacy loss when removing a data record
- ADD: Privacy loss when adding a data record
- BOTH: Analyze both directions (most conservative)
Bound Types
- DOMINATES: Upper bound (pessimistic, safe for privacy proofs)
- IS_DOMINATED: Lower bound (optimistic, for tightness evaluation)
PLD Dual
For a PLD realization L, the dual D(L) is the PLD in the reversed privacy direction (L_{Q,P}).
It reflects the support (l -> -l), reweights finite mass by exp(-l), and places the residual mass at +∞.
In remove-direction internals, we often use -D(L), obtained explicitly by negating D(L).
Adaptive Resolution
The *_range() functions iteratively refine discretization to achieve target accuracy:
- Start from Poisson-subsampled estimate
- Refine grid spacing and truncation
- Track best upper/lower bounds
- Stop when gap meets target (or after 10 iterations)
Common Workflows
1. Simple ε Query
from PLD_accounting import gaussian_allocation_epsilon_range
eps_upper, eps_lower = gaussian_allocation_epsilon_range(
sigma=5.0, num_steps=500, num_selected=8, delta=1e-5
)
2. PLD for Multiple Queries
from PLD_accounting import gaussian_allocation_PLD, PrivacyParams, AllocationSchemeConfig
pld = gaussian_allocation_PLD(
params=PrivacyParams(sigma=3.0, num_steps=100, num_selected=10),
config=AllocationSchemeConfig(),
)
# Query multiple (ε, δ) pairs efficiently
for delta in [1e-4, 1e-5, 1e-6]:
eps = pld.get_epsilon_for_delta(delta)
print(f"δ={delta:.0e} → ε={eps:.4f}")
3. Subsampling + Composition
from PLD_accounting import gaussian_allocation_PLD, subsample_PLD
# One training round
base_pld = gaussian_allocation_PLD(...)
# Apply subsampling amplification
subsampled = subsample_PLD(base_pld, sampling_probability=0.1)
# Compose across rounds
final_pld = subsampled.self_compose(num_rounds=20)
epsilon = final_pld.get_epsilon_for_delta(1e-6)
subsample_PLD() / subsample_PMF() are DOMINATES-only utilities; they do not accept a bound-type argument.
See usage_example.py for complete runnable examples including PREAMBLE-style workflows.
Configuration Parameters
PrivacyParams
sigma: Gaussian noise scale (higher = more privacy)num_steps: Total allocation stepsnum_selected: Clients per step (kin the paper)num_epochs: Training epochs (default 1)deltaorepsilon: Query target
AllocationSchemeConfig
loss_discretization: Grid spacing (smaller = tighter, slower). Default: 1e-3tail_truncation: Truncate probability mass below this. Default: 1e-10max_grid_FFT: FFT grid size limit. Default: 2,000,000max_grid_mult: Geometric grid size limit. Default: 0 (unlimited)convolution_method: FFT, GEOM, BEST_OF_TWO, or COMBINED
Tradeoff: Smaller loss_discretization and tail_truncation → tighter bounds but higher memory/runtime.
Requirements
- Python ≥ 3.10
- numpy ≥ 1.23
- scipy ≥ 1.10
- numba ≥ 0.58
- dp-accounting ≥ 0.4.3
All dependencies install automatically with the package.
Development Setup
From source:
git clone https://github.com/moshenfeld/PLD_accounting.git
cd PLD_accounting
pip install -e ".[dev]"
Run tests:
pytest -q
With coverage:
./tests/run_tests.sh --coverage
Build distribution:
python -m build
API Reference
Gaussian Path
gaussian_allocation_epsilon_range()- adaptive upper/lower ε boundsgaussian_allocation_delta_range()- adaptive upper/lower δ boundsgaussian_allocation_epsilon_extended()- single ε value with fixed configgaussian_allocation_delta_extended()- single δ value with fixed configgaussian_allocation_PLD()- build PLD object for repeated queries
Realization Path
general_allocation_PLD()- build PLD from explicit realizationsgeneral_allocation_epsilon()- compute ε from realizationsgeneral_allocation_delta()- compute δ from realizationsPLDRealization- linear-grid PLD realization type
Composition
subsample_PLD()- apply subsampling amplification to a PLD
Project Structure
PLD_accounting/
├── discrete_dist.py # Distribution types (Linear, Geometric, PLDRealization)
├── random_allocation_api.py # Public API surface
├── random_allocation_accounting.py # Shared composition logic
├── random_allocation_gaussian.py # Gaussian-specific implementation
├── adaptive_random_allocation.py # Adaptive resolution refinement
├── geometric_convolution.py # Multiplicative-grid convolution
├── FFT_convolution.py # Linear-grid convolution
├── subsample_PLD.py # Subsampling amplification
└── ...
tests/
├── unit/ # Type checks, validations, edge cases
└── integration/ # End-to-end workflows, comparisons
usage_example.py # Runnable examples
See IMPLEMENTATION_OVERVIEW.md for architectural details.
Citation
If you use this library in your research, please cite:
[Paper citation pending]
License
[License information]
Support
For questions or issues:
- GitHub Issues: https://github.com/moshenfeld/PLD_accounting/issues
- Documentation: See
usage_example.pyand docstrings
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pld_accounting-0.2.0.tar.gz.
File metadata
- Download URL: pld_accounting-0.2.0.tar.gz
- Upload date:
- Size: 38.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e31bc74a5d561bc9ffa8aaea1cbae23e030cc4da6812265418574e829e9dc4cb
|
|
| MD5 |
433f0f0630cd5089a6387a89952d76fa
|
|
| BLAKE2b-256 |
7714e37067d73231b8a1db1e103bbe50a1395033013a41727d8f1834c5eb919e
|
File details
Details for the file pld_accounting-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pld_accounting-0.2.0-py3-none-any.whl
- Upload date:
- Size: 41.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70ea3a766efceca096e5ef8f7d93a6742ab15d743a6862783d2186877d5e113b
|
|
| MD5 |
21d9dada2df231d5f1a5ff8016456cba
|
|
| BLAKE2b-256 |
cbdea10f47e36210472a8668a0f9e88ed9f3155e6c70e51bdfb8676bf3f0977d
|