Skip to main content

DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.

Project description

DOMIAS: Membership Inference Attacks against Synthetic Data through Overfitting Detection

Tests Python License Python 3.7+ about

Installation

The library can be installed from PyPI using

$ pip install domias

or from source, using

$ pip install .

API

The main API call is

from domias.evaluator import evaluate_performance

evaluate_performance expects as input a generator which implements the domias.models.generator.GeneratorInterface interface, and an evaluation dataset.

The supported arguments for evaluate_performance are:

  generator: GeneratorInterface
      Generator with the `fit` and `generate` methods. The generator MUST not be fitted.
  dataset: int
      The evaluation dataset, used to derive the training and test datasets.
  mem_set_size: int
      The split for the training dataset out of `dataset`
  reference_set_size: int
      The split for the reference dataset out of `dataset`.
  training_epochs: int
      Training epochs
  synthetic_sizes: List[int]
      For how many synthetic samples to test the attacks.
  density_estimator: str, default = "prior"
      Which density to use. Available options:
          * prior
          * bnaf
          * kde
  seed: int
      Random seed
  device: PyTorch device
      CPU or CUDA
  shifted_column: Optional[int]
      Shift a column
  zero_quantile: float
      Threshold for shifting the column.
  reference_kept_p: float
      Held-out dataset parameter

The output consists of dictionary with a key for each of the synthetic_sizes values.

For each synthetic_sizes value, the dictionary contains the keys:

  • MIA_performance : accuracy and AUCROC for each attack
  • MIA_scores: output scores for each attack
  • data: the evaluation data

For both MIA_performance and MIA_scores, the following attacks are evaluated:

  • "ablated_eq1" (Eq.1 (KDE))
  • "ablated_eq2" (DOMIAS (KDE))
  • "LOGAN_D1"
  • "MC"
  • "gan_leaks"
  • "gan_leaks_cal"
  • "LOGAN_0"
  • "eq1" (Eq. 1 (BNAF))
  • "domias"

Sample usage

Example for using evaluate_performance:

# third party
import pandas as pd
from sdv.tabular import TVAE

# domias absolute
from domias.evaluator import evaluate_performance
from domias.models.generator import GeneratorInterface


def get_generator(
    epochs: int = 1000,
    seed: int = 0,
) -> GeneratorInterface:
    class LocalGenerator(GeneratorInterface):
        def __init__(self) -> None:
            self.model = TVAE(epochs=epochs)

        def fit(self, data: pd.DataFrame) -> "LocalGenerator":
            self.model.fit(data)
            return self

        def generate(self, count: int) -> pd.DataFrame:
            return self.model.sample(count)

    return LocalGenerator()


dataset = ...  # Load your dataset as numpy array

mem_set_size = 1000
reference_set_size = 1000
training_epochs = 2000
synthetic_sizes = [1000]
density_estimator = "prior"  # prior, kde, bnaf

generator = get_generator(
    epochs=training_epochs,
)

perf = evaluate_performance(
    generator,
    dataset,
    mem_set_size,
    reference_set_size,
    training_epochs=training_epochs,
    synthetic_sizes=[100],
    density_estimator=density_estimator,
)

assert 100 in perf
results = perf[100]

assert "MIA_performance" in results
assert "MIA_scores" in results

print(results["MIA_performance"])

Experiments

  1. Experiments main paper

To reproduce results for DOMIAS, baselines, and ablated models, run

cd experiments
python3 domias_main.py --seed 0 --gan_method TVAE --dataset housing --mem_set_size_list 30 50 100 300 500 1000 --reference_set_size_list 10000 --synthetic_sizes 10000 --training_epoch_list 2000

changing arguments mem_set_size_list, reference_set_size_list, synthetic_sizes, and training_epoch_list for specific experiments over ranges (Experiments 5.1 and 5.2, see Appendix A for details) and gan_method for generative model of interest.

or equivalently, run

cd experiments && bash run_tabular.sh
  1. Experiments no prior knowledge (Appendix D)

If using prior knowledge (i.e., no reference dataset setting), add

--density_estimator prior
  1. Experiment images (Appendix B.3)

Note: The CelebA dataset must be available in the experiments/data folder.

To run experiment with the CelebA dataset, first run

cd experiments && python3 celeba_gen.py --seed 0 --mem_set_size 4000

and then

cd experiments && python3 celeba_eval.py --seed 0 --mem_set_size 4000

Tests

Install the testing dependencies using

pip install .[testing]

The tests can be executed using

pytest -vsx

Citing

TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

domias-0.0.5-py3-none-macosx_10_14_x86_64.whl (25.8 kB view details)

Uploaded Python 3 macOS 10.14+ x86-64

domias-0.0.5-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file domias-0.0.5-py3-none-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for domias-0.0.5-py3-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 2cff33cdc6e368e17c747f3648911dec3f3d145e3972118c03d2f95e08cf7c09
MD5 f16ea4401f9a8a9957f1fdcd37b6399e
BLAKE2b-256 b4745487f5fef768f559dd1e34a12760c2e7eba861a2dc75fa5f48a671eb9424

See more details on using hashes here.

File details

Details for the file domias-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: domias-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for domias-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d8c34b2af3b4d0b62a460b7172d4c12ab96eb29f9910fb0b912f79832253c968
MD5 820ce36a88176833069fdc18b54e57a2
BLAKE2b-256 988457e6bdfef479c850e58c1e5c1a54545829dd1b0a4eefc3063bacd7a827e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page