Skip to main content

Multiple Frequency Estimation Under Local Differential Privacy in Python

Project description

Multi-Freq-LDPy

Multi-Freq-LDPy is a Python library for performing multiple frequency estimation tasks (multidimensional, longitudinal, and both) under local differential privacy (LDP) guarantees. The main goal is to provide an easy-to-use and fast execution toolkit to benchmark and experiment with state-of-the-art solutions and LDP protocols.

Installation

Please use the package manager pip to install multi-freq-ldpy.

pip install multi-freq-ldpy

To ensure you use the latest version.

pip install multi-freq-ldpy --upgrade

Multi-Freq-LDPy requires the following Python packages.

numpy
numba
xxhash

Content

Multi-Freq-LDPy covers the following tasks:

  1. Single Frequency Estimation -- The best-performing frequency oracles from Locally Differentially Private Protocols for Frequency Estimation, namely:

    • Generalized Randomized Response (GRR): multi_freq_ldpy.pure_frequency_oracles.GRR
    • Symmetric/Optimized Unary Encoding (UE): multi_freq_ldpy.pure_frequency_oracles.UE
    • Binary/Optimized Local Hashing (LH): multi_freq_ldpy.pure_frequency_oracles.LH
    • Adaptive (ADP) protocol, i.e., GRR or Optimized UE: multi_freq_ldpy.pure_frequency_oracles.ADP
  2. Multidimensional Frequency Estimation -- Three solutions for frequency estimation of multiple attributes from Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy with their respective frequency oracles (GRR, UE-based, and ADP), namely:

    • Splitting (SPL) the privacy budget: multi_freq_ldpy.mdim_freq_est.SPL_solution
    • Random Sampling (SMP) a single attribute: multi_freq_ldpy.mdim_freq_est.SMP_solution
    • Random Sampling + Fake Data (RS+FD) that samples a single attribute but also generates fake data for each non-sampled attribute: multi_freq_ldpy.mdim_freq_est.RSpFD_solution
  3. Longitudinal Single Frequency Estimation -- All longitudinal LDP protocols from Improving the Utility of Locally Differentially Private Protocols for Longitudinal and Multidimensional Frequency Estimates following the memoization-based framework from RAPPOR, namely:

    • Longitudinal GRR (L-GRR): multi_freq_ldpy.long_freq_est.L_GRR
    • Longitudinal OUE (L-OUE): multi_freq_ldpy.long_freq_est.L_OUE
    • Longitudinal OUE-SUE (L-OSUE): multi_freq_ldpy.long_freq_est.L_OSUE
    • Longitudinal SUE (L-SUE): multi_freq_ldpy.long_freq_est.L_SUE
    • Longitudinal SUE-OUE (L-SOUE): multi_freq_ldpy.long_freq_est.L_SOUE
    • Longitudinal ADP (L-ADP), i.e., L-GRR or L-OSUE: multi_freq_ldpy.long_freq_est.L_ADP
  4. Longitudinal Multidimensional Frequency Estimation -- Both SPL and SMP solutions with all longitudinal protocols from previous point 3, namely:

    • Longitudinal SPL (L_SPL): multi_freq_ldpy.long_mdim_freq_est.L_SPL
    • Longitudinal SMP (L_SMP): multi_freq_ldpy.long_mdim_freq_est.L_SMP

Usage

This is a function-based package that simulates the LDP data collection pipeline of users and the server. For each functionality, there is always a Client and an Aggregator function. For more details, please refer to the tutorials folder, which covers all 1--4 tasks with real-world open datasets (Adult, Nursery, MS-FIMU).

# Common libraries
import numpy as np
import matplotlib.pyplot as plt

# Multi-Freq-LDPy functions for L-SUE protocol (a.k.a. Basic RAPPOR)
from multi_freq_ldpy.long_freq_est.L_SUE import L_SUE_Client, L_SUE_Aggregator

# Parameters for simulation
epsilon_perm = 2 # longitudinal privacy guarantee, i.e., upper bound (infinity reports)
epsilon_1 = 0.5 * epsilon_perm # single report privacy guarantee, i.e., lower bound
n = int(1e6) # number of users
k = 5 # attribute's domain size

# Simulation dataset where every user has a number between [0-5) with 1,000,000 users
data = [np.random.randint(k) for _ in range(n)]

# Simulation of client-side
l_sue_reports = [L_SUE_Client(input_data, k, epsilon_perm, epsilon_1) for input_data in data]

# Simulation of server-side aggregation
l_sue_est_freq = L_SUE_Aggregator(l_sue_reports, epsilon_perm, epsilon_1)

# Real frequency 
real_freq = np.unique(data, return_counts=True)[-1] / n

# Visualizing results
barwidth = 0.45
x_axis = np.arange(k)

plt.bar(x_axis - barwidth/2, real_freq, label='Real Freq', width=barwidth)
plt.bar(x_axis + barwidth/2 , l_sue_est_freq, label='Est Freq: L-SUE', width=barwidth)
plt.ylabel('Normalized Frequency')
plt.xlabel('Domain values')
plt.xticks(np.arange(0, k, 1), np.arange(1, k+1, 1))
plt.legend(loc='upper right', bbox_to_anchor=(1.015, 1.15))
plt.show();

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Contact

For any question, please contact Heber H. Arcolezi: heber.hwang-arcolezi [at] inria.fr

Acknowledgments

  • The Local Hashing (LH) functions were adapted from the pure-LDP package, which covers a wider range of frequency oracles for single-frequency estimation.
  • Some codes were previously developed/adapted from the ldp-protocols-mobility-cdrs repository.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi-freq-ldpy-0.2.3.tar.gz (17.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page