Skip to main content

Algorithms for generating synthetic data

Project description

Synthetic Data Generation

A repository consisting of algorithms for synthetic data generation.

All algorithms are differentially private (DP) enabling the user to specify their desired privacy level.

We support the following synthesizers:

  • MarginalSynthesizer: synthesize each column independently by the DP marginal distributions.
  • ContingencySynthesizer: synthesize data via a DP contingency tables of all attributes.
  • PrivBayes (Zhang et al, 2017): approximate the data through a Bayesian network with DP conditional distributions.

Usage

# import desired synthesis algorithm, e.g. PrivBayes
from synthesis.synthesizers.privbayes import PrivBayes
import pandas as pd

# load data
df = pd.read_csv('examples/data/input/adult_9c.csv')

# set desired privacy level (differential privacy) - default = 1
epsilon = 0.1

# instantiate and fit synthesizer
pb = PrivBayes(epsilon=epsilon)
pb.fit(df)

# Synthesize data
df_synth  = pb.sample()

Alternatively, check out the example notebooks under examples/tutorials.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic-data-generation-0.1.0.tar.gz (23.1 kB view hashes)

Uploaded Source

Built Distribution

synthetic_data_generation-0.1.0-py3-none-any.whl (64.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page