Generate synthetic data that simulate a given dataset.
Project description
[![PyPi Shield](https://img.shields.io/pypi/v/DataSynthesizer.svg)](https://pypi.python.org/pypi/DataSynthesizer)
[![Travis CI Shield](https://img.shields.io/travis/DataResponsibly/DataSynthesizer.svg?branch=master)](https://travis-ci.com/DataResponsibly/DataSynthesizer)
DataSynthesizer
Generate synthetic data that simulate a given dataset.
Usage
DataSynthesizer can generate a synthetic dataset from a sensitive one for release to public. It is developed in Python 3 and requires some third-party modules, such as numpy, pandas, and python-dateutil.
Its usage is presented in the following Jupyter Notebooks:
./notebooks/DataSynthesizer in random mode.ipynb
./notebooks/DataSynthesizer in independent attribute mode.ipynb
./notebooks/DataSynthesizer in correlated attribute mode.ipynb
Assumptions for Input Dataset
- The input dataset is a table in first normal form (1NF).
- When implementing differential privacy, DataSynthesizer injects noises into the statistics within active domain that are the values presented in the table.
Install DataSynthesizer
pip install DataSynthesizer
Run webUI
DataSynthesizer can be executed by a web-based UI.
cd DataSynthesizer/webUI/
python manage.py runserver
Visit http://127.0.0.1:8000/synthesizer/
History
0.1.0 - 2020-06-11
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
DataSynthesizer-0.1.0.tar.gz
(54.1 kB
view hashes)
Built Distribution
Close
Hashes for DataSynthesizer-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5919876360c9182f8ac6084b673560ac03f3c033e5726ef2189bc33063798259 |
|
MD5 | 0001966a79c9fad42739e5bf37ffb646 |
|
BLAKE2b-256 | 67065307c4f2ba4e659bee04923f15d9c56d211b77db2b337f76a7c41c281780 |