DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
Project description
DOMIAS: Membership Inference Attacks against Synthetic Data through Overfitting Detection
Installation
The library can be installed from PyPI using
$ pip install domias
or from source, using
$ pip install .
API
The main API call is
from domias.evaluator import evaluate_performance
evaluate_performance
expects as input a generator which implements the domias.models.generator.GeneratorInterface
interface, and an evaluation dataset.
The supported arguments for evaluate_performance
are:
generator: GeneratorInterface
Generator with the `fit` and `generate` methods. The generator MUST not be fitted.
dataset: int
The evaluation dataset, used to derive the training and test datasets.
mem_set_size: int
The split for the training dataset out of `dataset`
reference_set_size: int
The split for the reference dataset out of `dataset`.
training_epochs: int
Training epochs
synthetic_sizes: List[int]
For how many synthetic samples to test the attacks.
density_estimator: str, default = "prior"
Which density to use. Available options:
* prior
* bnaf
* kde
seed: int
Random seed
device: PyTorch device
CPU or CUDA
shifted_column: Optional[int]
Shift a column
zero_quantile: float
Threshold for shifting the column.
reference_kept_p: float
Held-out dataset parameter
The output consists of dictionary with a key for each of the synthetic_sizes
values.
For each synthetic_sizes
value, the dictionary contains the keys:
MIA_performance
: accuracy and AUCROC for each attackMIA_scores
: output scores for each attackdata
: the evaluation data
For both MIA_performance
and MIA_scores
, the following attacks are evaluated:
- "ablated_eq1" (Eq.1 (KDE))
- "ablated_eq2" (DOMIAS (KDE))
- "LOGAN_D1"
- "MC"
- "gan_leaks"
- "gan_leaks_cal"
- "LOGAN_0"
- "eq1" (Eq. 1 (BNAF))
- "domias"
Sample usage
Example for using evaluate_performance
:
# third party
import pandas as pd
from sdv.tabular import TVAE
# domias absolute
from domias.evaluator import evaluate_performance
from domias.models.generator import GeneratorInterface
def get_generator(
epochs: int = 1000,
seed: int = 0,
) -> GeneratorInterface:
class LocalGenerator(GeneratorInterface):
def __init__(self) -> None:
self.model = TVAE(epochs=epochs)
def fit(self, data: pd.DataFrame) -> "LocalGenerator":
self.model.fit(data)
return self
def generate(self, count: int) -> pd.DataFrame:
return self.model.sample(count)
return LocalGenerator()
dataset = ... # Load your dataset as numpy array
mem_set_size = 1000
reference_set_size = 1000
training_epochs = 2000
synthetic_sizes = [1000]
density_estimator = "prior" # prior, kde, bnaf
generator = get_generator(
epochs=training_epochs,
)
perf = evaluate_performance(
generator,
dataset,
mem_set_size,
reference_set_size,
training_epochs=training_epochs,
synthetic_sizes=[100],
density_estimator=density_estimator,
)
assert 100 in perf
results = perf[100]
assert "MIA_performance" in results
assert "MIA_scores" in results
print(results["MIA_performance"])
Experiments
- Experiments main paper
To reproduce results for DOMIAS, baselines, and ablated models, run
cd experiments
python3 domias_main.py --seed 0 --gan_method TVAE --dataset housing --mem_set_size_list 30 50 100 300 500 1000 --reference_set_size_list 10000 --synthetic_sizes 10000 --training_epoch_list 2000
changing arguments mem_set_size_list, reference_set_size_list, synthetic_sizes, and training_epoch_list for specific experiments over ranges (Experiments 5.1 and 5.2, see Appendix A for details) and gan_method for generative model of interest.
or equivalently, run
cd experiments && bash run_tabular.sh
- Experiments no prior knowledge (Appendix D)
If using prior knowledge (i.e., no reference dataset setting), add
--density_estimator prior
- Experiment images (Appendix B.3)
Note: The CelebA dataset must be available in the experiments/data
folder.
To run experiment with the CelebA dataset, first run
cd experiments && python3 celeba_gen.py --seed 0 --mem_set_size 4000
and then
cd experiments && python3 celeba_eval.py --seed 0 --mem_set_size 4000
Tests
Install the testing dependencies using
pip install .[testing]
The tests can be executed using
pytest -vsx
Citing
TODO
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file domias-0.0.5-py3-none-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: domias-0.0.5-py3-none-macosx_10_14_x86_64.whl
- Upload date:
- Size: 25.8 kB
- Tags: Python 3, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cff33cdc6e368e17c747f3648911dec3f3d145e3972118c03d2f95e08cf7c09 |
|
MD5 | f16ea4401f9a8a9957f1fdcd37b6399e |
|
BLAKE2b-256 | b4745487f5fef768f559dd1e34a12760c2e7eba861a2dc75fa5f48a671eb9424 |
File details
Details for the file domias-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: domias-0.0.5-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8c34b2af3b4d0b62a460b7172d4c12ab96eb29f9910fb0b912f79832253c968 |
|
MD5 | 820ce36a88176833069fdc18b54e57a2 |
|
BLAKE2b-256 | 988457e6bdfef479c850e58c1e5c1a54545829dd1b0a4eefc3063bacd7a827e6 |