Simple pure LDP frequency oracle implementations
Project description
Pure-LDP
pure-LDP is a Python package that provides simple implementations of various state-of-the-art LDP algorithms (both Frequency Oracles and Heavy Hitters) with the main goal of providing a single, simple interface to benchmark and experiment with these algorithms.
If pure-LDP is useful to you and has been used in your work in any way we would appreciate a reference to:
Installation
Use the package manager pip to install.
pip install pure-ldp
To upgrade to the latest version
pip install pure-ldp --upgrade
Pure-LDP Requires the following Python modules
xxhash
numpy
scipy
bitstring
bitarray
matplotlib
seaborn
statsmodels
sklearn
Outline
The package has implementations of all three main frequency oracles detailed in paper "Locally Differentially Private Protocols for Frequency Estimation" by Wang et al which are:
- (Optimal) Unary Encoding - Under
pure_ldp.frequency_oracles.unary_encoding
- (Summation/Thresholding) Histogram encoding - Under
pure_ldp.frequency_oracles.histogram_encoding
- (Optimal) Local Hashing - Under
pure_ldp.frequency_oracles.local_hashing
The package also includes an implementation of the heavy hitter algorithm Prefix Extending Method (PEM) under pure_ldp.heavy_hitters.prefix_extending
Over time it has evolved to include many more implementations of other LDP frequency estimation algorithms:
- Apple's Count Mean Sketch (CMS / HCMS) Algorithm - This is under
pure_ldp.frequency_oracles.apple_cms
- Google's RAPPOR i.e DE combined with Bloom filters under
pure_ldp.frequency_oracles.rappor
- Hadamard Response (HR) - This is under
pure_ldp.frequency_oracles.hadamard_response
the code implemented for this is simply a pure-LDP wrapper of the repo hadamard_response - Hadamard Mechanism (HM) under
pure_ldp.frequency_oracles.hadamard_mechanism
- Direct Encoding (DE) / Generalised Randomised Response under
pure_ldp.frequency_oracles.direct_encoding
- Fast Local Hashing (FLH) a heuristic variant of OLH under
pure_ldp.frequency_oracles.local_hashing
- Generic private sketching protocols (SketchResponse) under
pure_ldp.frequency_oracles.sketch_response
The library also includes implementations of other Heavy Hitter (HH) algorithms:
- Apple's Sequence Fragment Puzzle (SFP) algorithm under
pure_ldp.frequency_oracles.apple_sfp
- TreeHistogram (by Bassily et al) under
pure_ldp.frequency_oracles.treehistogram
Basic Usage
import numpy as np
from pure_ldp.frequency_oracles.local_hashing import LHClient, LHServer
# Using Optimal Local Hashing (OLH)
epsilon = 3 # Privacy budget of 3
d = 4 # For simplicity, we use a dataset with 4 possible data items
client_olh = LHClient(epsilon=epsilon, d=d, use_olh=True)
server_olh = LHServer(epsilon=epsilon, d=d, use_olh=True)
# Test dataset, every user has a number between 1-4, 10,000 users total
data = np.concatenate(([1]*4000, [2]*3000, [3]*2000, [4]*1000))
for item in data:
# Simulate client-side privatisation
priv_data = client_olh.privatise(item)
# Simulate server-side aggregation
server_olh.aggregate(priv_data)
# Simulate server-side estimation
print(server_olh.estimate(1)) # Should be approximately 4000 +- 200
See example.py for more examples.
Simulation Framework
This is currently WIP but there is already significant code under pure_ldp.simulations
that allow you to build experiments to compare various frequency oracles/heavy hitters under various conditions. Generic helpers to run experiments for FOs and HHs are included under pure_ldp.simulations.helpers
. See pure_ldp.simulations.paper_experiments.py
for examples
TODO
- Better documentation !
Acknowledgements
- Some OLH code is based on the implementation by Tianhao Wang: repo
- The Hadamard Response code is just a wrapper of the k2khadamard.py code in the repo hadamard_response by Ziteng Sun
Contributing
If you feel like this package could be improved in any way, open an issue or make a pull request!
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pure-ldp-1.2.0.tar.gz
.
File metadata
- Download URL: pure-ldp-1.2.0.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 071763d0a9fbef4ca6dd6da1d8e3b0a712e34f967eb0401cc130fa634b116b6d |
|
MD5 | 14cf8d2207db549aa5e9dd69e8de3fcd |
|
BLAKE2b-256 | 1f54c54c0e96c0961c7d441be28ab42d9ea4006e879a7fabe0dd04721d5963a1 |
File details
Details for the file pure_ldp-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: pure_ldp-1.2.0-py3-none-any.whl
- Upload date:
- Size: 79.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f2220f12c66bd0b69a8ec06f17158ef40550ac5f069467359ed1b3473ecf9eb |
|
MD5 | e89ba655f0147cd5c616f00b14572c98 |
|
BLAKE2b-256 | 8b1d713f7d029c3b8c4fe3722e6d61b36e767b86a0137993d3b384e0235918f2 |