Multiple Frequency Estimation Under Local Differential Privacy in Python
Project description
Multi-Freq-LDPy
Multi-Freq-LDPy is a Python library for performing multiple frequency estimation tasks (multidimensional, longitudinal, and both) under local differential privacy (LDP) guarantees. The main goal is to provide an easy-to-use and fast execution toolkit to benchmark and experiment with state-of-the-art solutions and LDP protocols.
Installation
Please use the package manager pip to install multi-freq-ldpy.
pip install multi-freq-ldpy
To ensure you use the latest version.
pip install multi-freq-ldpy --upgrade
Multi-Freq-LDPy requires the following Python packages.
numpy
numba
xxhash
Content
Multi-Freq-LDPy covers the following tasks:
-
Single Frequency Estimation -- The best-performing frequency oracles from Locally Differentially Private Protocols for Frequency Estimation, namely:
- Generalized Randomized Response (GRR):
multi_freq_ldpy.pure_frequency_oracles.GRR
- Symmetric/Optimized Unary Encoding (UE):
multi_freq_ldpy.pure_frequency_oracles.UE
- Binary/Optimized Local Hashing (LH):
multi_freq_ldpy.pure_frequency_oracles.LH
- Adaptive (ADP) protocol (i.e., either GRR or Optimized UE) depending on variance value:
multi_freq_ldpy.pure_frequency_oracles.ADP
- Generalized Randomized Response (GRR):
-
Multidimensional Frequency Estimation -- Three solutions for frequency estimation of multiple attributes from Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy with their respective frequency oracles (GRR, UE-based, and ADP), namely:
- Splitting (SPL) the privacy budget:
multi_freq_ldpy.mdim_freq_est.SPL_solution
- Random Sampling (SMP) a single attribute:
multi_freq_ldpy.mdim_freq_est.SMP_solution
- Random Sampling + Fake Data (RS+FD) that samples a single attribute but also generates fake data for each non-sampled attribute:
multi_freq_ldpy.mdim_freq_est.RSpFD_solution
- Splitting (SPL) the privacy budget:
-
Longitudinal Single Frequency Estimation -- All longitudinal LDP protocols from Improving the Utility of Locally Differentially Private Protocols for Longitudinal and Multidimensional Frequency Estimates following the memoization-based framework from RAPPOR, namely:
- Longitudinal GRR (L-GRR):
multi_freq_ldpy.long_freq_est.L_GRR
- Longitudinal OUE (L-OUE):
multi_freq_ldpy.long_freq_est.L_OUE
- Longitudinal OUE-SUE (L-OSUE):
multi_freq_ldpy.long_freq_est.L_OSUE
- Longitudinal SUE (L-SUE):
multi_freq_ldpy.long_freq_est.L_SUE
- Longitudinal SUE-OUE (L-SOUE):
multi_freq_ldpy.long_freq_est.L_SOUE
- Longitudinal ADP (L-ADP), i.e., L-GRR or L-OSUE:
multi_freq_ldpy.long_freq_est.L_ADP
- Longitudinal GRR (L-GRR):
-
Longitudinal Multidimensional Frequency Estimation -- Both SPL and SMP solutions with all longitudinal protocols from previous point 3, namely:
- Longitudinal SPL (L_SPL):
multi_freq_ldpy.long_mdim_freq_est.L_SPL
- Longitudinal SMP (L_SMP)
multi_freq_ldpy.long_mdim_freq_est.L_SMP
- Longitudinal SPL (L_SPL):
Usage
This is a function-based package that simulates the LDP data collection pipeline of users and the server. For each functionality, there is always a Client
and an Aggregator
function. For more details, please refer to the tutorials folder, which covers all 1--4 tasks with real-world open datasets (Adult, Nursery, MS-FIMU).
# Common libraries
import numpy as np
import matplotlib.pyplot as plt
# Multi-Freq-LDPy functions for L-SUE protocol (a.k.a. Basic RAPPOR)
from multi_freq_ldpy.long_freq_est.L_SUE import L_SUE_Client, L_SUE_Aggregator
# Parameters for simulation
epsilon_perm = 2 # longitudinal privacy guarantee, i.e., upper bound (infinity reports)
epsilon_1 = 0.5 * epsilon_perm # single report privacy guarantee, i.e., lower bound
n = int(1e6) # number of users
k = 5 # attribute's domain size
# Simulation dataset where every user has a number between [0-5) with 1,000,000 users
data = [np.random.randint(k) for _ in range(n)]
# Simulation of client-side
l_sue_reports = [L_SUE_Client(input_data, k, epsilon_perm, epsilon_1) for input_data in data]
# Simulation of server-side aggregation
l_sue_est_freq = L_SUE_Aggregator(l_sue_reports, epsilon_perm, epsilon_1)
# Real frequency
real_freq = np.unique(data, return_counts=True)[-1] / n
# Visualizing results
barwidth = 0.45
x_axis = np.arange(k)
plt.bar(x_axis - barwidth/2, real_freq, label='Real Freq', width=barwidth)
plt.bar(x_axis + barwidth/2 , l_sue_est_freq, label='Est Freq: L-SUE', width=barwidth)
plt.ylabel('Normalized Frequency')
plt.xlabel('Domain values')
plt.xticks(np.arange(0, k, 1), np.arange(1, k+1, 1))
plt.legend(loc='upper right', bbox_to_anchor=(1.015, 1.15))
plt.show();
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Contact
For any question, please contact Héber H. Arcolezi: heber.hwang-arcolezi [at] inria.fr
Acknowledgments
The Local Hashing (LH) functions were adapted from the pure-LDP package, which covers a wider range of frequency oracles for single-frequency estimation.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.