Algorithms for generating synthetic data
Project description
Synthetic Data Generation
A repository consisting of algorithms for synthetic data generation.
All algorithms are differentially private (DP) enabling the user to specify their desired privacy level.
We support the following synthesizers:
- MarginalSynthesizer: synthesize each column independently by the DP marginal distributions.
- ContingencySynthesizer: synthesize data via a DP contingency tables of all attributes.
- PrivBayes (Zhang et al, 2017): approximate the data through a Bayesian network with DP conditional distributions.
Usage
# import desired synthesis algorithm, e.g. PrivBayes
from synthesis.synthesizers.privbayes import PrivBayes
import pandas as pd
# load data
df = pd.read_csv('examples/data/input/adult_9c.csv')
# set desired privacy level (differential privacy) - default = 1
epsilon = 0.1
# instantiate and fit synthesizer
pb = PrivBayes(epsilon=epsilon)
pb.fit(df)
# Synthesize data
df_synth = pb.sample()
Alternatively, check out the example notebooks under examples/tutorials.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for synthetic-data-generation-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8a9680bd1f1323c64491938270906490a56ddcb44df88daffe9de1897aa30da |
|
MD5 | b0e66a1d3b263ee146fd553bdd0f752b |
|
BLAKE2b-256 | 444e4d2a18d70d4be00095594dd007e06e5ba901efa47f343e4d71c311e1ba05 |
Close
Hashes for synthetic_data_generation-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64a631d974d86b9779bc05d185a05b34b3ed7b198d54c34efaf2473b8a4d9567 |
|
MD5 | fd27f3da1a82fe19f0bba97feec3ec05 |
|
BLAKE2b-256 | e54543cb1ac518015c72ae9323a713a16f0bb8c82f19473fdfa0ec9bc7b805c5 |