Skip to main content

Simple pure LDP frequency oracle implementations

Project description

Pure-LDP

pure-LDP is a Python package that provides simple implementations of various state-of-the-art LDP algorithms (both Frequency Oracles and Heavy Hitters) with the main goal of providing a single, simple interface to benchmark and experiment with these algorithms.

If pure-LDP is useful to you and has been used in your work in any way we would appreciate a reference to:

Installation

Use the package manager pip to install.

pip install pure-ldp

To upgrade to the latest version

pip install pure-ldp --upgrade

Pure-LDP Requires the following Python modules

xxhash
numpy
scipy
bitstring
bitarray
matplotlib
seaborn
statsmodels
sklearn

Outline

The package has implementations of all three main frequency oracles detailed in paper "Locally Differentially Private Protocols for Frequency Estimation" by Wang et al which are:

  1. (Optimal) Unary Encoding - Under pure_ldp.frequency_oracles.unary_encoding
  2. (Summation/Thresholding) Histogram encoding - Under pure_ldp.frequency_oracles.histogram_encoding
  3. (Optimal) Local Hashing - Under pure_ldp.frequency_oracles.local_hashing

The package also includes an implementation of the heavy hitter algorithm Prefix Extending Method (PEM) under pure_ldp.heavy_hitters.prefix_extending

Over time it has evolved to include many more implementations of other LDP frequency estimation algorithms:

  1. Apple's Count Mean Sketch (CMS / HCMS) Algorithm - This is under pure_ldp.frequency_oracles.apple_cms
  2. Google's RAPPOR i.e DE combined with Bloom filters under pure_ldp.frequency_oracles.rappor
  3. Hadamard Response (HR) - This is under pure_ldp.frequency_oracles.hadamard_response the code implemented for this is simply a pure-LDP wrapper of the repo hadamard_response
  4. Hadamard Mechanism (HM) under pure_ldp.frequency_oracles.hadamard_mechanism
  5. Direct Encoding (DE) / Generalised Randomised Response under pure_ldp.frequency_oracles.direct_encoding
  6. Fast Local Hashing (FLH) a heuristic variant of OLH under pure_ldp.frequency_oracles.local_hashing
  7. Generic private sketching protocols (SketchResponse) under pure_ldp.frequency_oracles.sketch_response

The library also includes implementations of other Heavy Hitter (HH) algorithms:

  1. Apple's Sequence Fragment Puzzle (SFP) algorithm under pure_ldp.frequency_oracles.apple_sfp
  2. TreeHistogram (by Bassily et al) under pure_ldp.frequency_oracles.treehistogram

Basic Usage

import numpy as np
from pure_ldp.frequency_oracles.local_hashing import LHClient, LHServer

# Using Optimal Local Hashing (OLH)

epsilon = 3 # Privacy budget of 3
d = 4 # For simplicity, we use a dataset with 4 possible data items

client_olh = LHClient(epsilon=epsilon, d=d, use_olh=True)
server_olh = LHServer(epsilon=epsilon, d=d, use_olh=True)

# Test dataset, every user has a number between 1-4, 10,000 users total
data = np.concatenate(([1]*4000, [2]*3000, [3]*2000, [4]*1000))

for item in data:
    # Simulate client-side privatisation
    priv_data = client_olh.privatise(item)

    # Simulate server-side aggregation
    server_olh.aggregate(priv_data)

# Simulate server-side estimation
print(server_olh.estimate(1)) # Should be approximately 4000 +- 200

See example.py for more examples.

Simulation Framework

This is currently WIP but there is already significant code under pure_ldp.simulations that allow you to build experiments to compare various frequency oracles/heavy hitters under various conditions. Generic helpers to run experiments for FOs and HHs are included under pure_ldp.simulations.helpers. See pure_ldp.simulations.paper_experiments.py for examples

TODO

  1. Better documentation !

Acknowledgements

  1. Some OLH code is based on the implementation by Tianhao Wang: repo
  2. The Hadamard Response code is just a wrapper of the k2khadamard.py code in the repo hadamard_response by Ziteng Sun

Contributing

If you feel like this package could be improved in any way, open an issue or make a pull request!

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pure-ldp-1.2.0.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

pure_ldp-1.2.0-py3-none-any.whl (79.2 kB view details)

Uploaded Python 3

File details

Details for the file pure-ldp-1.2.0.tar.gz.

File metadata

  • Download URL: pure-ldp-1.2.0.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pure-ldp-1.2.0.tar.gz
Algorithm Hash digest
SHA256 071763d0a9fbef4ca6dd6da1d8e3b0a712e34f967eb0401cc130fa634b116b6d
MD5 14cf8d2207db549aa5e9dd69e8de3fcd
BLAKE2b-256 1f54c54c0e96c0961c7d441be28ab42d9ea4006e879a7fabe0dd04721d5963a1

See more details on using hashes here.

File details

Details for the file pure_ldp-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: pure_ldp-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 79.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pure_ldp-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f2220f12c66bd0b69a8ec06f17158ef40550ac5f069467359ed1b3473ecf9eb
MD5 e89ba655f0147cd5c616f00b14572c98
BLAKE2b-256 8b1d713f7d029c3b8c4fe3722e6d61b36e767b86a0137993d3b384e0235918f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page