Evaluation of differentially private tabular data
Project description
SmartNoise Evaluator
The SmartNoise Evaluator is designed to help assess the privacy and accuracy of differentially private queries. It includes:
- Analyze: Analyze a dataset and provide information about cardinality, data types, independencies, and other information that is useful for creating a privacy pipeline
- Evaluator: Compares the privatized results to the true results and provides information about the accuracy and bias
These tools currently require PySpark.
Analyze
Analyze provides metrics about a single dataset.
- Percent of all dimension combinations that are unique, k < 5 and k < 10 (Count up to configurable “reporting length”)
- Report which columns are “most linkable”
- Marginal histograms up to n-way -- choose default with reasonable size (e.g. 10 per marginal, and up to 20 marginals -- allow override). Trim and encode labels.
- Number of rows
- Number of distinct rows
- Count, Mean, Variance, Min, Max, Median, Percentiles for each marginal
- Classification AUC
- Individual Cardinalities
- Dimensionality, Sparsity
- Independencies
Evaluate
Evaluate compares an original data file with one or more comparison files. It can compare any of the single-file metrics computed in Analyze
as well as a number of metrics that involve two datasets. When more than one comparison dataset is provided, we can provide all of the two-way comparisons with the original, and allow the consumer to combine these measures (e.g. average over all datasets)
- How many dimension combinations are suppressed
- How many dimension combinations are fabricated
- How many redacted rows (fully redacted vs. partly redacted)
- Mean absolute error by 1-way, 2-way, etc. up to reporting length
- Also do for user specified dimension combinations
- Report by bin size (e.g., < 1000, >= 1000)
- Mean proportional error by 1-way, 2-way, etc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for smartnoise_eval-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42d91244660989cc20fe268218909021a52a4e21b7743bba7d82fc5234a742ec |
|
MD5 | 1a89f382f07aee2690154002f2efb1b2 |
|
BLAKE2b-256 | 752b7beecfff0019f9c5a8e3213dcfb715be0ba74be7a64de4a6b89d18dde9f4 |