Algorithms for generating synthetic data
Project description
Synthetic Data Generation
Algorithms for generating synthetic data.
All algorithms are differentially private (DP) enabling the user to specify their desired privacy level.
We support the following synthesizers:
- MarginalSynthesizer: synthesize each column independently via DP marginal distributions.
- UniformSynthesizer: synthesize each column independently via uniform distributions.
- ContingencySynthesizer: synthesize data via a DP contingency table of all attributes.
- PrivBayes (Zhang et al, 2017): synthesize data via a Bayesian network with DP conditional distributions.
Install
Using pip
:
pip install synthetic-data-generation
Usage
# import pandas and desired synthesis algorithm, e.g. PrivBayes
import pandas as pd
from synthesis.synthesizers import PrivBayes
# load data
df = pd.read_csv('examples/data/original/adult.csv')
# set desired privacy level (differential privacy) - default = 1
epsilon = 0.1
# instantiate and fit synthesizer
pb = PrivBayes(epsilon=epsilon)
pb.fit(df)
# Synthesize data
df_synth = pb.sample()
Alternatively, check out the example notebooks under examples/tutorials.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for synthetic-data-generation-0.1.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 856ec35e03cc33ffcfccdbe297306103d9fc06f453acbeeaf0866a907bf615b0 |
|
MD5 | cfe4c65370a7ed5e7752e1d9dc492f4d |
|
BLAKE2b-256 | b3850aac114b966bfea23b521069afb8fc83ef9e5366f496c264e3753a6a14a5 |
Close
Hashes for synthetic_data_generation-0.1.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 958754445bd23f06629924423efcb982fc896c14d9032ebd61f8ad1c0d63fd1b |
|
MD5 | 9993832041dc5063af3b44319872ebfe |
|
BLAKE2b-256 | 7c06efe9478aee964582c015297d6d4638be1a96c80e4af66d80ca5217b6400b |