Skip to main content

Algorithms for generating synthetic data

Project description

Synthetic Data Generation

Algorithms for generating synthetic data.

All algorithms are differentially private (DP) enabling the user to specify their desired privacy level.

We support the following synthesizers:

  • MarginalSynthesizer: synthesize each column independently via DP marginal distributions.
  • UniformSynthesizer: synthesize each column independently via uniform distributions.
  • ContingencySynthesizer: synthesize data via a DP contingency table of all attributes.
  • PrivBayes (Zhang et al, 2017): synthesize data via a Bayesian network with DP conditional distributions.

Install

Using pip:

pip install synthetic-data-generation

Usage

# import pandas and desired synthesis algorithm, e.g. PrivBayes
import pandas as pd
from synthesis.synthesizers import PrivBayes

# load data
df = pd.read_csv('examples/data/original/adult.csv')

# set desired privacy level (differential privacy) - default = 1
epsilon = 0.1

# instantiate and fit synthesizer
pb = PrivBayes(epsilon=epsilon)
pb.fit(df)

# Synthesize data
df_synth  = pb.sample()

Alternatively, check out the example notebooks under examples/tutorials.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthetic-data-generation-0.1.3.tar.gz (23.8 kB view hashes)

Uploaded Source

Built Distribution

synthetic_data_generation-0.1.3-py2.py3-none-any.whl (66.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page