Skip to main content

DPLab: Benchmarking Differential Privacy Aggregation Operations

Project description

DPLab: Benchmarking Differential Privacy Aggregation Operations

This repo targets to provide a unified interface to access and evaluate the same aggregation functionalities in different open-source differential privacy (DP) libraries. With a simple CLI, one can choose the library, the aggregation function, and many other experimental parameters and apply the specified DP measurement to data stored in a .csv file. The repo also provides both synthetic and real-world example datasets for evaluation purposes. Evaluation results are stored in a .json file and metrics are provided for repeated experiments. The repo also provides a CLI tool to generate configuration groups for larger-scale comparison experiments.

dplab_architecture

Currently supported aggregation operations:

  • COUNT
  • SUM
  • MEAN
  • VAR
  • MEDIAN
  • QUANTILE

Currently supported libraries:

Installation

Clone the repo, switch the working directory, and install the dependencies

git clone git@github.com:camelop/dp_lab.git
cd dp_lab
pip install -e .

To use tmlt

export PYSPARK_PYTHON=/usr/bin/python3
sudo apt install openjdk-8-jre-headless
pip3 install -i https://d3p0voevd56kj6.cloudfront.net python-flint
pip3 install tmlt.analytics

To use chorus, please make sure you have Java runtime installed.

How to run experiments in the benchmark

Generate the experiment commands, this will generate an ./exp.db.json file under the working directory (you can also use --location to specify a different place).

dplab_exp plan --repeat 100 --group_num 100

Queue the experiments for execution

dplab_exp launch --debug

How to run dp libraries in the benchmark

Run a specific library with the CLI

dplab_run <library> <operation> <input_file> <output_file> <other options>

For example:

dplab_run pydp sum data/1.csv data/1.json -f -r 1000

Other options include:

  • mode: Evaluation mode, one can choose from "plain" (no timing/mem measurement), "internal" (internal measurement), or "external" (external tracking).
  • epsilon: DP parameter.
  • quant: Quantile value for QUANTILE operation, a float number between 0 and 1.
  • repeat: How many time should the evaluation repeat.
  • force: Force to overwrite the output file.
  • debug: Include debugging information in the output file.
  • python_command: Python command used to run the script in the external mode.
  • external_sample_interval: timing/mem consumption sample interval in the external mode.

For more information, please check the main entry file.

Generating synthetic data

# Make sure you are in the root directory of the repo
# Data will be generated in the ./data/ directory
# The procedure will generate about 28GB of data
# To avoid the risk of running out of disk space, you can comment out the performance test lines (Line26-27) in SYN_TARGETS defined in the script
python3 scripts/gen_data.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dplab-0.0.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

dplab-0.0.1-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file dplab-0.0.1.tar.gz.

File metadata

  • Download URL: dplab-0.0.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for dplab-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e0274a5640a73d0cd161692af580a979f2008e34743b9d5ca90029b7ed62d914
MD5 ee892e4022a8bc640865573b9a80eef0
BLAKE2b-256 bfc4c01d51d5258323ed4ccc0aec1ead11f7897bb2d5dc68c40cfd8c4a3c98b6

See more details on using hashes here.

File details

Details for the file dplab-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dplab-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for dplab-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 67acdb967cea8ac79babb32e344c04ffe4810aeb80752e734dffa1e32adaa3c4
MD5 d3406897d19e51a558f4b4aec9d112d4
BLAKE2b-256 6e02c190c3505b681d4f4ff58cdceeaa96d8e93a234f33f3d3d07f28995974ec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page