Skip to main content

Conditional GAN for Tabular Data

Project description

“DAI-Lab” An open source project from Data to AI Lab at MIT.

PyPI Shield Travis CI Shield

CTGAN

Implementation of our NeurIPS paper Modeling Tabular data using Conditional GAN.

CTGAN is a GAN-based data synthesizer that can generate synthetic tabular data with high fidelity.

Overview

Based on previous work (TGAN) on synthetic data generation, we develop a new model called CTGAN. Several major differences make CTGAN outperform TGAN.

  • Preprocessing: CTGAN uses more sophisticated Variational Gaussian Mixture Model to detect modes of continuous columns.
  • Network structure: TGAN uses LSTM to generate synthetic data column by column. CTGAN uses Fully-connected networks which is more efficient.
  • Features to prevent mode collapse: We design a conditional generator and resample the training data to prevent model collapse on discrete columns. We use WGANGP and PacGAN to stabilize the training of GAN.

Install

Requirements

CTGAN has been developed and tested on Python 3.5, 3.6 and 3.7

Install from PyPI

The recommended way to installing CTGAN is using pip:

pip install ctgan

This will pull and install the latest stable release from PyPI.

Install from source

Alternatively, you can clone the repository and install it from source by running make install on the stable branch:

git clone git@github.com:DAI-Lab/CTGAN.git
cd CTGAN
git checkout stable
make install

Install for Development

If you want to contribute to the project, a few more steps are required to make the project ready for development.

Please head to the Contributing Guide for more details about this process.

Quickstart

In this short tutorial we will guide you through a series of steps that will help you getting started with CTGAN.

Data format

The data is a space (or tab) separated file. For example,

100        A        True
200        B        False
105        A        True
120        C        False
...        ...        ...

Metafile describes each column as one line. C or D at the beginning of each line represent continuous column or discrete column respectively. For continuous column, the following two number indicates the range of the column. For discrete column, the following strings indicate all possible values in the column. For example,

C    0    500
D    A    B    C
D    True     False

Run model

USAGE:
    python3 ctgan/cli.py [flags]
flags:
  --data: Filename of training data.
    (default: '')
  --max_epoch: Epoches to train.
    (default: '100')
    (an integer)
  --meta: Filename of meta data.
    (default: '')
  --model_dir: Path to save model.
    (default: '')
  --output: Output filename.
    (default: '')
  --sample: Number of rows to generate.
    (default: '1000')
    (an integer)

Example

It's easy to try our model using example datasets.

git clone https://github.com/DAI-Lab/ctgan
cd ctgan
python3 -m ctgan.cli --data examples/adult.dat --meta examples/adult.meta

What's next?

For more details about CTGAN and all its possibilities and features, please check the documentation site.

Citing TGAN

If you use CTGAN, please cite the following work:

  • Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. Modeling Tabular data using Conditional GAN. NeurIPS, 2019.
@inproceedings{xu2019modeling,
  title={Modeling Tabular data using Conditional GAN},
  author={Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},
  booktitle={Advances in Neural Information Processing Systems},
  year={2019}
}

History

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctgan-0.1.0.dev0.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ctgan-0.1.0.dev0-py2.py3-none-any.whl (10.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file ctgan-0.1.0.dev0.tar.gz.

File metadata

  • Download URL: ctgan-0.1.0.dev0.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.6.8

File hashes

Hashes for ctgan-0.1.0.dev0.tar.gz
Algorithm Hash digest
SHA256 817e7367356de271cfebf9532efa5456ec2f4b396da02c0f504a15471e1f1087
MD5 491468b48c3a16118d73a33d3eb9ff57
BLAKE2b-256 5b0288a0cec2cd17a3bc13c75ce8b1ee04460da9cb39fc666905c81b1ad56bbd

See more details on using hashes here.

File details

Details for the file ctgan-0.1.0.dev0-py2.py3-none-any.whl.

File metadata

  • Download URL: ctgan-0.1.0.dev0-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.6.8

File hashes

Hashes for ctgan-0.1.0.dev0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 02e94641e48fff0d2120b086c2768b0c5b1a11e2af9243003c2542d2c6267bdd
MD5 1ddd046edaf10f099e2c9f63410303a6
BLAKE2b-256 f3717adb2cd69b96505c8c77960ec6563795ead1e12474287aba545948ce87be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page