Skip to main content

Synthetic Data Generation with optional Differential Privacy

Project description

Gretel Synthetics

gretel-synthetics workflows

Documentation Status

This code has been developed and tested on Python 3.6, 3.7, and 3.8.

This code is developed for TensorFlow 2.3.X and above.

This package allows developers to quickly get immersed with synthetic data generation through the use of neural networks. The more complex pieces of working with libraries like Tensorflow and differential privacy are bundled into friendly Python classes and functions.

For example usage, please launch the example Jupyter Notebook and step through the config, train, and generation examples. Open in Colab

NOTE: The settings in our Jupyter Notebook examples are optimized to run on a GPU, which you can experiment with for free in Google Colaboratory. If you're running on a CPU, you might want to grab a cup of coffee, or lower max_lines and epochs to 5000 and 10, respectively.

Roadmap

Pre 0.14.X

Prior to the 0.14.x versions of Gretel Synthetics, we noticed that the differential privacy library we are using (tensorflow-privacy) may not be properly called based on the version of TensorFlow being used, particularly TF 2.1+. What this means is that with the dp option enabled on versions before 0.14.X, the synthetic data may not have been run through DP optimizers properly. We are currently working with the TensorFlow privacy team on an update to resolve this situation.

0.14.X

This release series will continue to operate as the versions prior and we will continue to add new functionality that makes training more automated and user friendly. Some enhancements are incorporating Keras' features to do early stopping of model training based on observed loss or accuracy and ensuring that the best versions of models are stored. This will remove the need to guess an optimal number of training epochs and help train the best model sooner.

One temporary change that will be done in this release series is throwing a RuntimeError in the event the dp option is enabled. We are doing this for a couple of reasons:

  1. We want to reduce the risk DP is not applied properly to your data. By default, dp has always been disabled by default, so this will continue to remain the case.

  2. We did not want to drastically change the signature of the configuration object. By removing these options it becomes more ambiguous to throw a TypeError because of removed parameters than it does to throw a RunTimeError with a more detailed explanation of why the option cannot be used temporarily.

0.15.X

We are currently working to ensure that our differentially private optimizers are called correctly when enabled, and plan to introduce them in this release series. To correctly subclass the standard non-differentially private optimizers in a future-proof way, we are leveraging the Keras V2 optimizer interfaces introduced in TensorFlow 2.4.x. Additionally, we will be doing a significant amount of hyperparameter optimization and provide default optimizers and hyperparameters for non-DP and DP training.

In this release you may expect to see an interface change to the configuration object. We are exploring the use of an optimizer parameter that will take an optional Optimizer() or DPOptimizer() class that you can instantiate yourself and provide to the configuration. This will allow you to explore multiple optimizers with your data. We will still continue to provide the dp boolean option that if used will default to optimal Optimizer() or DPOptimizer() objects based on our hyperparameter testing and should work well for a variety of general synthetic use cases.

Getting Started

By default, we do not install Tensorflow via pip as many developers and cloud services such as Google Colab are running customized versions for their hardware. If you wish to pip install Tensorflow along with gretel-synthetics, use the [tf] commands below instead.

pip install -U .                     # Do not install Tensorflow by default (assuming you have built a distro for your hardware)
pip install -U -e ".[tf]"            # Install a pinned version of Tensorflow"

or

pip install gretel-synthetics        # Do not install Tensorflow by default (assuming you have built a distro for your hardware)
pip install gretel-synthetics[tf]    # Install a pinned version of Tensorflow

then...

$ pip install jupyter
$ jupyter notebook

When the UI launches in your browser, navigate to examples/synthetic_records.ipynb and get generating!

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gretel-synthetics-0.14.0.tar.gz (953.9 kB view details)

Uploaded Source

Built Distribution

gretel_synthetics-0.14.0-py3-none-any.whl (32.6 kB view details)

Uploaded Python 3

File details

Details for the file gretel-synthetics-0.14.0.tar.gz.

File metadata

  • Download URL: gretel-synthetics-0.14.0.tar.gz
  • Upload date:
  • Size: 953.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.6

File hashes

Hashes for gretel-synthetics-0.14.0.tar.gz
Algorithm Hash digest
SHA256 570e8285ef7fae9bf86e325a5e0c9f57ad96dac66b5899cfa8734c46758415b4
MD5 20f9ca2660c0f576301767007f7989e4
BLAKE2b-256 fb428f0c84a8a9bc2125f43e2a9640efff1340a80ad30b510bc30b35dc57562d

See more details on using hashes here.

File details

Details for the file gretel_synthetics-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: gretel_synthetics-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 32.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.6

File hashes

Hashes for gretel_synthetics-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33097efa5db91a86fa1a8e846a21bf6b58321238a99047444bd1b5ce1c54fce1
MD5 31fccfe9eaeceb867115cdaa80b7be59
BLAKE2b-256 22bcaf27f167deb9a59438fae59048ebb25887a01f1fe2cd633b2fa49eaf9b7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page