Skip to main content

DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.

Project description

DOMIAS: Membership Inference Attacks against Synthetic Data through Overfitting Detection

Tests Python License Python 3.7+ about

Installation

The library can be installed from PyPI using

$ pip install domias

or from source, using

$ pip install .

API

The main API call is

from domias.evaluator import evaluate_performance

evaluate_performance expects as input a generator which implements the domias.models.generator.GeneratorInterface interface, and an evaluation dataset.

The supported arguments for evaluate_performance are:

  generator: GeneratorInterface
      Generator with the `fit` and `generate` methods. The generator MUST not be fitted.
  dataset: int
      The evaluation dataset, used to derive the training and test datasets.
  mem_set_size: int
      The split for the training dataset out of `dataset`
  reference_set_size: int
      The split for the reference dataset out of `dataset`.
  training_epochs: int
      Training epochs
  synthetic_sizes: List[int]
      For how many synthetic samples to test the attacks.
  density_estimator: str, default = "prior"
      Which density to use. Available options:
          * prior
          * bnaf
          * kde
  seed: int
      Random seed
  device: PyTorch device
      CPU or CUDA
  shifted_column: Optional[int]
      Shift a column
  zero_quantile: float
      Threshold for shifting the column.
  reference_kept_p: float
      Held-out dataset parameter

The output consists of dictionary with a key for each of the synthetic_sizes values.

For each synthetic_sizes value, the dictionary contains the keys:

  • MIA_performance : accuracy and AUCROC for each attack
  • MIA_scores: output scores for each attack
  • data: the evaluation data

For both MIA_performance and MIA_scores, the following attacks are evaluated:

  • "ablated_eq1" (Eq.1 (KDE))
  • "ablated_eq2" (DOMIAS (KDE))
  • "LOGAN_D1"
  • "MC"
  • "gan_leaks"
  • "gan_leaks_cal"
  • "LOGAN_0"
  • "eq1" (Eq. 1 (BNAF))
  • "domias"

Sample usage

Example for using evaluate_performance:

# third party
import pandas as pd
from sdv.tabular import TVAE

# domias absolute
from domias.evaluator import evaluate_performance
from domias.models.generator import GeneratorInterface


def get_generator(
    epochs: int = 1000,
    seed: int = 0,
) -> GeneratorInterface:
    class LocalGenerator(GeneratorInterface):
        def __init__(self) -> None:
            self.model = TVAE(epochs=epochs)

        def fit(self, data: pd.DataFrame) -> "LocalGenerator":
            self.model.fit(data)
            return self

        def generate(self, count: int) -> pd.DataFrame:
            return self.model.sample(count)

    return LocalGenerator()


dataset = ...  # Load your dataset as numpy array

mem_set_size = 1000
reference_set_size = 1000
training_epochs = 2000
synthetic_sizes = [1000]
density_estimator = "prior"  # prior, kde, bnaf

generator = get_generator(
    epochs=training_epochs,
)

perf = evaluate_performance(
    generator,
    dataset,
    mem_set_size,
    reference_set_size,
    training_epochs=training_epochs,
    synthetic_sizes=[100],
    density_estimator=density_estimator,
)

assert 100 in perf
results = perf[100]

assert "MIA_performance" in results
assert "MIA_scores" in results

print(results["MIA_performance"])

Experiments

  1. Experiments main paper

To reproduce results for DOMIAS, baselines, and ablated models, run

cd experiments
python3 domias_main.py --seed 0 --gan_method TVAE --dataset housing --mem_set_size_list 30 50 100 300 500 1000 --reference_set_size_list 10000 --synthetic_sizes 10000 --training_epoch_list 2000

changing arguments mem_set_size_list, reference_set_size_list, synthetic_sizes, and training_epoch_list for specific experiments over ranges (Experiments 5.1 and 5.2, see Appendix A for details) and gan_method for generative model of interest.

or equivalently, run

cd experiments && bash run_tabular.sh
  1. Experiments no prior knowledge (Appendix D)

If using prior knowledge (i.e., no reference dataset setting), add

--density_estimator prior
  1. Experiment images (Appendix B.3)

Note: The CelebA dataset must be available in the experiments/data folder.

To run experiment with the CelebA dataset, first run

cd experiments && python3 celeba_gen.py --seed 0 --mem_set_size 4000

and then

cd experiments && python3 celeba_eval.py --seed 0 --mem_set_size 4000

Tests

Install the testing dependencies using

pip install .[testing]

The tests can be executed using

pytest -vsx

Citing

TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

domias-0.0.5-py3-none-macosx_10_14_x86_64.whl (25.8 kB view hashes)

Uploaded Python 3 macOS 10.14+ x86-64

domias-0.0.5-py3-none-any.whl (26.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page