Skip to main content

Generate synthetic data that simulate a given dataset.

Project description

[![PyPi Shield](https://img.shields.io/pypi/v/DataSynthesizer.svg)](https://pypi.python.org/pypi/DataSynthesizer)
[![Travis CI Shield](https://img.shields.io/travis/DataResponsibly/DataSynthesizer.svg?branch=master)](https://travis-ci.com/DataResponsibly/DataSynthesizer)

DataSynthesizer

Generate synthetic data that simulate a given dataset.

Usage

DataSynthesizer can generate a synthetic dataset from a sensitive one for release to public. It is developed in Python 3 and requires some third-party modules, such as numpy, pandas, and python-dateutil.

Its usage is presented in the following Jupyter Notebooks:

  • ./notebooks/DataSynthesizer in random mode.ipynb
  • ./notebooks/DataSynthesizer in independent attribute mode.ipynb
  • ./notebooks/DataSynthesizer in correlated attribute mode.ipynb

Assumptions for Input Dataset

  1. The input dataset is a table in first normal form (1NF).
  2. When implementing differential privacy, DataSynthesizer injects noises into the statistics within active domain that are the values presented in the table.

Install DataSynthesizer

pip install DataSynthesizer

Run webUI

DataSynthesizer can be executed by a web-based UI.

cd DataSynthesizer/webUI/
python manage.py runserver

Visit http://127.0.0.1:8000/synthesizer/

History

0.1.0 - 2020-06-11

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DataSynthesizer-0.1.0.tar.gz (54.1 kB view hashes)

Uploaded Source

Built Distribution

DataSynthesizer-0.1.0-py2.py3-none-any.whl (23.3 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page