Skip to main content

Multiple Frequency Estimation Under Local Differential Privacy in Python

Project description

Multi-Freq-LDPy: Multiple Frequency Estimation Under Local Differential Privacy in Python

Multi-Freq-LDPy is a Python library for performing multiple frequency estimation tasks (multidimensional, longitudinal, and both) under local differential privacy (LDP) guarantees. The main goal is to provide an easy-to-use and fast execution toolkit to benchmark and experiment with state-of-the-art solutions and LDP protocols.

Here's an introductory Video_Presentation, Slide_Presentation, and arXived Demonstration Paper of our package.

If our codes and work are useful to you, we would appreciate a reference to:

@article{arcolezi2022multi,
  title={Multi-Freq-LDPy: Multiple Frequency Estimation Under Local Differential Privacy in Python},
  author={Arcolezi, H{\'e}ber H and Couchot, Jean-Fran{\c{c}}ois and Gambs, S{\'e}bastien and Palamidessi, Catuscia and Zolfaghari, Majid},
  journal={arXiv preprint arXiv:2205.02648},
  year={2022}
}

Installation

Please use the package manager pip to install multi-freq-ldpy.

pip install multi-freq-ldpy

To ensure you use the latest version.

pip install multi-freq-ldpy --upgrade

Multi-Freq-LDPy requires the following Python packages.

numpy
numba
xxhash

Content

Multi-Freq-LDPy and its modules are structured as follows.

multi-freq-ldpy package
|
|- pure_frequency_oracles (Single Frequency Estimation)
|  |- GRR (Generalized Randomized Response[1,2] a.k.a. k-RR or Direct Encoding)
|  |- UE (Unary Encoding)
|  |  |- SUE (Symmetric UE[3] a.k.a. Basic One-Time RAPPOR[11])
|  |  |- OUE (Optimized UE[3])
|  |- LH (Local Hashing)
|  |  |- BLH (Binary LH[3,4])
|  |  |- OLH (Optimized LH[3])
|  |- SS (Subset Selection[5,6])
|  |- ADP (Adaptive, i.e., GRR or OUE)
|
|- mdim_freq_est (Multidimensional Frequency Estimation)
|  |- SPL_solution (Splitting solution[7,8]): Splits the privacy budget and sanitizes using pure_frequency_oracles LDP protocols
|  |  |- SPL_GRR, SPL_SUE, SPL_OUE, SPL_BLH, SPL_OLH, SPL_SS, SPL_ADP
|  |- SMP_solution (Random Sampling solution[7,8]): Samples a single attribute and sanitizes using pure_frequency_oracles LDP protocols
|  |  |- SMP_GRR, SMP_SUE, SMP_OUE, SMP_BLH, SMP_OLH, SMP_SS, SMP_ADP
|  |- RSpFD_solution (Random Sampling + Fake Data solution[9]): Samples a single attribute to sanitize but also generates fake data for each non-sampled attribute
|  |  |- RSpFD_GRR (fake data generated following domain size)
|  |  |- RSpFD_SUE_zero (fake data generated with SUE applied to a zero-vector)
|  |  |- RSpFD_SUE_rnd (fake data generated with SUE applied to a random bit-vector)
|  |  |- RSpFD_OUE_zero (fake data generated with OUE applied to a zero-vector)
|  |  |- RSpFD_OUE_rnd (fake data generated with OUE applied to a random bit-vector)
|  |  |- RSpFD_ADP (RSpFD_GRR or RSpFD_OUE_z)
|
|- long_freq_est (Longitudinal Single Frequency Estimation)
|  |- L_GRR (Longitudinal GRR[10])
|  |- L_OUE (Longitudinal OUE[10])
|  |- L_OSUE (Longitudinal OUE-SUE[10])
|  |- L_SUE (Longitudinal SUE[10], a.k.a. Basic RAPPOR[11])
|  |- L_SOUE (Longitudinal SUE-OUE[10])
|  |- L_ADP (Longitudinal ADP[10], i.e., L-GRR or L-OSUE)
|  |- dBitFlipPM[12]
|
|- long_mdim_freq_est (Longitudinal Multidimensional Frequency Estimation)
|  |- Longitudinal SPL (L_SPL_Solution[10]): Splits the privacy budget and sanitizes using long_freq_est LDP protocols
|  |  |- SPL_L_GRR, SPL_L_OUE, SPL_L_OSUE, SPL_L_SUE, SPL_L_SOUE, SPL_L_ADP, SPL_dBitFlipPM
|  |- Longitudinal SMP (L_SMP_Solution[10]): Samples a single attribute and sanitizes using long_freq_est LDP protocols
|  |  |- SMP_L_GRR, SMP_L_OUE, SMP_L_OSUE, SMP_L_SUE, SMP_L_SOUE, SMP_L_ADP, SMP_dBitFlipPM

Usage

This is a function-based package that simulates the LDP data collection pipeline of users and the server. For each functionality, there is always a Client and an Aggregator function. For more details, please refer to the tutorials folder, which covers all 1--4 tasks with real-world open datasets (Adult, Nursery, MS-FIMU).

# Common libraries
import numpy as np
import matplotlib.pyplot as plt

# Multi-Freq-LDPy functions for L-SUE protocol (a.k.a. Basic RAPPOR[11])
from multi_freq_ldpy.long_freq_est.L_SUE import L_SUE_Client, L_SUE_Aggregator

# Parameters for simulation
epsilon_perm = 2 # longitudinal privacy guarantee, i.e., upper bound (infinity reports)
epsilon_1 = 0.5 * epsilon_perm # single report privacy guarantee, i.e., lower bound
n = int(1e6) # number of users
k = 5 # attribute's domain size

# Simulation dataset where every user has a number between [0-k) with n users
data = np.random.randint(k, size=n)

# Simulation of client-side
l_sue_reports = [L_SUE_Client(input_data, k, epsilon_perm, epsilon_1) for input_data in data]

# Simulation of server-side aggregation
l_sue_est_freq = L_SUE_Aggregator(l_sue_reports, epsilon_perm, epsilon_1)

# Real frequency 
real_freq = np.unique(data, return_counts=True)[-1] / n

# Visualizing results
barwidth = 0.45
x_axis = np.arange(k)

plt.bar(x_axis - barwidth/2, real_freq, label='Real Freq', width=barwidth)
plt.bar(x_axis + barwidth/2 , l_sue_est_freq, label='Est Freq: L-SUE', width=barwidth)
plt.ylabel('Normalized Frequency')
plt.xlabel('Domain values')
plt.legend(loc='upper right', bbox_to_anchor=(1.015, 1.15))
plt.show();

Contributing

Multi-Freq-LDPy is a work in progress, and we expect to release new versions frequently, incorporating feedback and code contributions from the community. Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Contact

For any question, please contact Heber H. Arcolezi: heber.hwang-arcolezi [at] inria.fr

Acknowledgments

  • The Local Hashing (LH) functions were adapted from the pure-LDP package, which covers a wider range of frequency oracles for single-frequency estimation.
  • Some codes were adapted from our ldp-protocols-mobility-cdrs repository.

License

MIT

References

  • [1] Kairouz, Peter, Keith Bonawitz, and Daniel Ramage. "Discrete distribution estimation under local privacy." International Conference on Machine Learning. PMLR, 2016.
  • [2] Kairouz, Peter, Sewoong Oh, and Pramod Viswanath. "Extremal mechanisms for local differential privacy." Advances in neural information processing systems 27 (2014).
  • [3] Wang, Tianhao, et al. "Locally differentially private protocols for frequency estimation." 26th USENIX Security Symposium (USENIX Security 17). 2017.
  • [4] Bassily, Raef, and Adam Smith. "Local, private, efficient protocols for succinct histograms." Proceedings of the forty-seventh annual ACM symposium on Theory of computing. 2015.
  • [5] Ye, Min, and Alexander Barg. "Optimal schemes for discrete distribution estimation under locally differential privacy." IEEE Transactions on Information Theory 64.8 (2018): 5662-5676.
  • [6] Wang, Shaowei, et al. "Mutual information optimally local private discrete distribution estimation." arXiv preprint arXiv:1607.08025 (2016).
  • [7] Nguyên, Thông T., et al. "Collecting and analyzing data from smart device users with local differential privacy." arXiv preprint arXiv:1606.05053 (2016).
  • [8] Wang, Ning, et al. "Collecting and analyzing multidimensional data with local differential privacy." 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019.
  • [9] Arcolezi, Héber H., et al. "Random sampling plus fake data: Multidimensional frequency estimates with local differential privacy." Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021.
  • [10] Arcolezi, Héber H., et al. "Improving the utility of locally differentially private protocols for longitudinal and multidimensional frequency estimates." Digital Communications and Networks (2022).
  • [11] Erlingsson, Úlfar, Vasyl Pihur, and Aleksandra Korolova. "Rappor: Randomized aggregatable privacy-preserving ordinal response." Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. 2014.
  • [12] Ding, Bolin, Janardhan Kulkarni, and Sergey Yekhanin. "Collecting telemetry data privately." Advances in Neural Information Processing Systems 30 (2017).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi-freq-ldpy-0.2.4.tar.gz (21.1 kB view details)

Uploaded Source

File details

Details for the file multi-freq-ldpy-0.2.4.tar.gz.

File metadata

  • Download URL: multi-freq-ldpy-0.2.4.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.8

File hashes

Hashes for multi-freq-ldpy-0.2.4.tar.gz
Algorithm Hash digest
SHA256 44385193e2a9c4ad2f1f3fad802753b048aac4181bab0f7fae57e754e4831deb
MD5 e1de2151ca05e1b7e68bbaa868466f80
BLAKE2b-256 971841e395dd099fd43bd0ff725479f948b03da2d1d2cd7c06d1f059272ab949

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page