DPLab: Benchmarking Differential Privacy Aggregation Operations
Project description
DPLab: Benchmarking Differential Privacy Aggregation Operations
This repo targets to provide a unified interface to access and evaluate the same aggregation functionalities in different open-source differential privacy (DP) libraries. With a simple CLI, one can choose the library, the aggregation function, and many other experimental parameters and apply the specified DP measurement to data stored in a .csv
file. The repo also provides both synthetic and real-world example datasets for evaluation purposes. Evaluation results are stored in a .json
file and metrics are provided for repeated experiments. The repo also provides a CLI tool to generate configuration groups for larger-scale comparison experiments.
1-Min Tutorial
Get hands-on in 1 minute with our tutorial notebook.
Currently supported aggregation operations:
- COUNT
- SUM
- MEAN
- VAR
- MEDIAN
- QUANTILE
Currently supported libraries:
- diffprivlib 0.5.2 [Homepage] [Example Usage]
- python-dp 1.1.1 [Homepage] [Example Usage]
- opendp 0.6.1 [Homepage] [Example Usage]
- tmlt.analytics 0.4.1 [Homepage] [Example Usage]
- chorus 0.1.3 [Homepage] [Example Usage]
Installation
To install dplab, one can use the package on pypi
pip install dplab
Or with source code: clone the repo, switch the working directory, and install the dependencies
git clone git@github.com:camelop/dp_lab.git
cd dp-lab
pip install -e .
To use tmlt
export PYSPARK_PYTHON=/usr/bin/python3
sudo apt install openjdk-8-jre-headless
pip3 install -i https://d3p0voevd56kj6.cloudfront.net python-flint
pip3 install tmlt.analytics
To use chorus, please make sure you have Java runtime installed. (If you have already installed tmlt, it should be fine.)
How to run dp libraries in the benchmark
Run a specific library with the CLI
dplab_run <library> <operation> <input_file> <output_file> <other options>
For example:
dplab_run pydp sum data/1.csv data/1.json -f -r 1000
Other options include:
mode
: Evaluation mode, one can choose from "plain" (no timing/mem measurement), "internal" (internal measurement), or "external" (external tracking).epsilon
: DP parameter, default is set to1
.quant
: Quantile value for QUANTILE operation, a float number between 0 and 1.lb
: The optional value lower bound estimation used when applying certain differential privacy aggregations.ub
: The optional value upper bound estimation used when applying certain differential privacy aggregations.repeat
: How many time should the evaluation repeat.force
: Force to overwrite the output file.debug
: Include debugging information in the output file.python_command
: Python command used to run the script in the external mode.external_sample_interval
: timing/mem consumption sample interval in the external mode.
For more information, please check the main entry file.
Generating synthetic data
# Make sure you are in the root directory of the repo
# Data will be generated in the ./data/ directory
# The procedure will generate about 28GB of data
# To avoid the risk of running out of disk space, you can comment out the performance test lines (Line26-27) in SYN_TARGETS defined in the script
python3 scripts/gen_data.py
How to run experiments in the benchmark
Generate the experiment commands, this will generate an ./exp.db.json
file under the working directory (you can also use --location
to specify a different place).
dplab_exp plan --repeat 100 --group_num 100
Queue the experiments for execution
dplab_exp launch --debug
The command updates the results to exp.db.json
.
One can potentially view the results via
python3 scripts/view_exp_db.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dplab-0.0.7.tar.gz
.
File metadata
- Download URL: dplab-0.0.7.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1b81e4ba11d8d134f112a46b592ba8aa63ef7a83c3cb0063d55f87993522830 |
|
MD5 | 0660b1525a91088c11ba8db6fdaf8b76 |
|
BLAKE2b-256 | 22438a733c452a148291b18673b00c36fe57d1c7691aa4449a698197d44cbfdf |
File details
Details for the file dplab-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: dplab-0.0.7-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb016dbc1cf7aa9ad75017eb96bbc000d50cf4d4bf0fd4ba9905379608783630 |
|
MD5 | d4ccc06d77107d689f3e25c9dc041e55 |
|
BLAKE2b-256 | 832d313c78769fcdc05ec43f7feddbfed3181c29352dfe3ef1d081976a1ff108 |