Skip to main content

Generate synthetic data that simulate a given dataset.

Project description

PyPi Shield Travis CI Shield

DataSynthesizer

DataSynthesizer generates synthetic data that simulates a given dataset.

It aims to facilitate the collaborations between data scientists and owners of sensitive data. It applies Differential Privacy techniques to achieve strong privacy guarantee.

For more details, please refer to DataSynthesizer: Privacy-Preserving Synthetic Datasets

Install DataSynthesizer

pip install DataSynthesizer

Usage

Assumptions for the Input Dataset

  1. The input dataset is a table in first normal form (1NF).
  2. When implementing differential privacy, DataSynthesizer injects noises into the statistics within active domain that are the values presented in the table.

Use Jupyter Notebooks

# install jupyter first
pip install jupyter

There are some demos in ./notebooks/

Use webUI

DataSynthesizer can be executed by a web-based UI.

# install django
pip install django

# go to the directory for webUI
cd DataSynthesizer/webUI/

# run the server
python manage.py runserver

Then open a browser and visit http://127.0.0.1:8000/synthesizer/

History

0.1.0 - 2020-06-11

  • First release on PyPI.

0.1.1 - 2020-07-05

Bugs Fixed

  • Numpy error when synthesising data with unique identifiers. - Issue #23 by @raids

0.1.2 - 2020-07-19

Bugs Fixed

  • infer_distribution() for string attributes fails to sort index of varying types. - Issue #24 by @raids

0.1.3 - 2020-09-13

Bugs Fixed

  • The dataframes are not appended into the full space in get_noisy_distribution_of_attributes(). - Issue #26 by @zjroth

0.1.4 - 2021-01-14

Bugs Fixed

  • Fix a bug in candidate key identification.

0.1.5 - 2021-03-11

What's New

  • Downgrade required Python from >=3.8 to >=3.7.

0.1.6 - 2021-03-11

What's New

  • Update example notebooks.

0.1.7 - 2021-03-31

Bugs Fixed

  • Fixed an error in Laplace noise parameter. - Issue #34 by @ganevgv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DataSynthesizer-0.1.7.tar.gz (128.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

DataSynthesizer-0.1.7-py2.py3-none-any.whl (23.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file DataSynthesizer-0.1.7.tar.gz.

File metadata

  • Download URL: DataSynthesizer-0.1.7.tar.gz
  • Upload date:
  • Size: 128.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.6.0.post20210108 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for DataSynthesizer-0.1.7.tar.gz
Algorithm Hash digest
SHA256 344db86572505783d443dc4bc0d3c1f7b1f84496f48f3e3ceee055a3baa82641
MD5 df540a78894e1c46fdff90cbc72b2a35
BLAKE2b-256 78f8551e731d67755c8c5cc0fd65801ea78429e41c57137bd445e1c2f459e395

See more details on using hashes here.

File details

Details for the file DataSynthesizer-0.1.7-py2.py3-none-any.whl.

File metadata

  • Download URL: DataSynthesizer-0.1.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.6.0.post20210108 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for DataSynthesizer-0.1.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 19ef5a8cd156b2e47657fb02ad34b8aecf7b3c8ce17bce1f037e7223b77e1cf1
MD5 07dfbee3357464d85ce13ec4f6111cc4
BLAKE2b-256 cfaa7f61337fcdbdef9aafabc297d8956bc37fb026432fbab96d9dcf136eb33e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page