Create tabular synthetic data using a conditional GAN

These details have not been verified by PyPI

Project links

Project description

This repository is part of The Synthetic Data Vault Project, a project from DataCebo.

Overview

CTGAN is a collection of Deep Learning based synthetic data generators for single table data, which are able to learn from real data and generate synthetic data with high fidelity.

Important Links
:computer: Website	Check out the SDV Website for more information about our overall synthetic data ecosystem.
:orange_book: Blog	A deeper look at open source, synthetic data creation and evaluation.
:book: Documentation	Quickstarts, User and Development Guides, and API Reference.
:octocat: Repository	The link to the Github Repository of this library.
:keyboard: Development Status	This software is in its Pre-Alpha stage.
Community	Join our Slack Workspace for announcements and discussions.

Currently, this library implements the CTGAN and TVAE models described in the Modeling Tabular data using Conditional GAN paper, presented at the 2019 NeurIPS conference.

Install

Use CTGAN through the SDV library

:warning: If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. :warning:

The SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints. See the SDV documentation to get started.

Use the CTGAN standalone library

Alternatively, you can also install and use CTGAN directly, as a standalone library:

Using pip:

pip install ctgan

Using conda:

conda install -c pytorch -c conda-forge ctgan

When using the CTGAN library directly, you may need to manually preprocess your data into the correct format, for example:

Continuous data must be represented as floats
Discrete data must be represented as ints or strings
The data should not contain any missing values

Usage Example

In this example we load the Adult Census Dataset* which is a built-in demo dataset. We use CTGAN to learn from the real data and then generate some synthetic data.

from ctgan import CTGAN
from ctgan import load_demo

real_data = load_demo()

# Names of the columns that are discrete
discrete_columns = [
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country',
    'income'
]

ctgan = CTGAN(epochs=10)
ctgan.fit(real_data, discrete_columns)

# Create synthetic data
synthetic_data = ctgan.sample(1000)

*For more information about the dataset see: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Join our community

Join our Slack channel to discuss more about CTGAN and synthetic data. If you find a bug or have a feature request, you can also open an issue on our GitHub.

Interested in contributing to CTGAN? Read our Contribution Guide to get started.

Citing CTGAN

If you use CTGAN, please cite the following work:

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan Veeramachaneni. Modeling Tabular data using Conditional GAN. NeurIPS, 2019.

@inproceedings{ctgan,
  title={Modeling Tabular data using Conditional GAN},
  author={Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and Veeramachaneni, Kalyan},
  booktitle={Advances in Neural Information Processing Systems},
  year={2019}
}

Related Projects

Please note that these projects are external to the SDV Ecosystem. They are not affiliated with or maintained by DataCebo.

R Interface for CTGAN: A wrapper around CTGAN that brings the functionalities to R users. More details can be found in the corresponding repository: https://github.com/kasaai/ctgan
CTGAN Server CLI: A package to easily deploy CTGAN onto a remote server. Created by Timothy Pillow @oregonpillow at: https://github.com/oregonpillow/ctgan-server-cli

The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:

🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular, multi table and time series data.
📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data generation models.

Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.11.0

Feb 26, 2025

0.11.0.dev0 pre-release

Feb 25, 2025

0.10.2

Oct 22, 2024

0.10.2.dev0 pre-release

Oct 18, 2024

0.10.1

May 13, 2024

0.10.1.dev0 pre-release

May 10, 2024

0.10.0

Apr 11, 2024

0.10.0.dev0 pre-release

Apr 10, 2024

0.9.1

Mar 14, 2024

0.9.1.dev0 pre-release

Mar 14, 2024

0.9.0

Feb 13, 2024

0.9.0.dev0 pre-release

Feb 13, 2024

0.8.0

Nov 13, 2023

0.8.0.dev0 pre-release

Nov 13, 2023

0.7.5

Oct 5, 2023

0.7.5.dev0 pre-release

Oct 5, 2023

0.7.4

Jul 25, 2023

0.7.4.dev0 pre-release

Jul 24, 2023

0.7.3

May 25, 2023

0.7.3.dev0 pre-release

May 25, 2023

0.7.2

May 9, 2023

0.7.2.dev1 pre-release

May 9, 2023

0.7.2.dev0 pre-release

May 4, 2023

0.7.1

Feb 23, 2023

0.7.1.dev0 pre-release

Feb 23, 2023

0.7.0

Jan 20, 2023

0.7.0.dev0 pre-release

Jan 19, 2023

0.6.0

Oct 7, 2022

0.6.0.dev0 pre-release

Oct 6, 2022

0.5.3.dev0 pre-release

Sep 27, 2022

0.5.2

Aug 19, 2022

0.5.2.dev1 pre-release

Aug 18, 2022

0.5.2.dev0 pre-release

Jul 10, 2022

0.5.1

Feb 25, 2022

0.5.1.dev3 pre-release

Feb 25, 2022

0.5.1.dev2 pre-release

Feb 23, 2022

0.5.1.dev1 pre-release

Feb 23, 2022

0.5.1.dev0 pre-release

Feb 18, 2022

0.5.0

Nov 18, 2021

0.5.0.dev1 pre-release

Nov 16, 2021

0.5.0.dev0 pre-release

Nov 10, 2021

0.4.4.dev0 pre-release

Nov 4, 2021

0.4.3

Jul 12, 2021

0.4.3.dev1 pre-release

Jul 7, 2021

0.4.3.dev0 pre-release

Jul 2, 2021

0.4.2

Apr 27, 2021

0.4.2.dev0 pre-release

Apr 27, 2021

0.4.1

Mar 30, 2021

0.4.1.dev1 pre-release

Mar 29, 2021

0.4.1.dev0 pre-release

Mar 8, 2021

0.4.0

Feb 24, 2021

0.4.0.dev1 pre-release

Feb 23, 2021

0.4.0.dev0 pre-release

Feb 23, 2021

0.3.2.dev0 pre-release

Feb 22, 2021

0.3.1

Jan 27, 2021

0.3.1.dev2 pre-release

Jan 27, 2021

0.3.1.dev1 pre-release

Jan 27, 2021

0.3.1.dev0 pre-release

Dec 23, 2020

0.3.0

Dec 18, 2020

0.3.0.dev1 pre-release

Dec 18, 2020

0.3.0.dev0 pre-release

Dec 16, 2020

0.2.2

Nov 13, 2020

0.2.2.dev3 pre-release

Nov 13, 2020

0.2.2.dev2 pre-release

Nov 10, 2020

0.2.2.dev1 pre-release

Aug 7, 2020

0.2.2.dev0 pre-release

Jul 9, 2020

0.2.1

Jan 27, 2020

0.2.1.dev0 pre-release

Jan 27, 2020

0.2.0

Dec 18, 2019

0.2.0.dev0 pre-release

Dec 18, 2019

0.1.0

Nov 8, 2019

0.1.0.dev0 pre-release

Nov 8, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctgan-0.11.0.tar.gz (26.1 kB view details)

Uploaded Feb 26, 2025 Source

Built Distribution

ctgan-0.11.0-py3-none-any.whl (24.4 kB view details)

Uploaded Feb 26, 2025 Python 3

File details

Details for the file ctgan-0.11.0.tar.gz.

File metadata

Download URL: ctgan-0.11.0.tar.gz
Upload date: Feb 26, 2025
Size: 26.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.6 tqdm/4.66.1 importlib-metadata/6.8.0 keyring/24.2.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.13

File hashes

Hashes for ctgan-0.11.0.tar.gz
Algorithm	Hash digest
SHA256	`dd08b02370d375663f282f020d1729ee80e4b16bf27a61c82156bfe690c2092b`
MD5	`777a77f45f647eb710a68aa659ce9da0`
BLAKE2b-256	`89959ddfd01c8f668fc85048eaa6caf49009fc4cda2395848347261eb67c64cc`

See more details on using hashes here.

File details

Details for the file ctgan-0.11.0-py3-none-any.whl.

File metadata

Download URL: ctgan-0.11.0-py3-none-any.whl
Upload date: Feb 26, 2025
Size: 24.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/42.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.6 tqdm/4.66.1 importlib-metadata/6.8.0 keyring/24.2.0 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.13

File hashes

Hashes for ctgan-0.11.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae84b28ae0d131b729c7bd3439db871941c2ffc92755a106f811a085f013b656`
MD5	`e7276861b3c4bfb029a28a4b617cdff5`
BLAKE2b-256	`900cdb2b3039762226ba93004aa4a104e3471c4ae596fbecbb3db236900663bf`

See more details on using hashes here.

ctgan 0.11.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Install

Use CTGAN through the SDV library

Use the CTGAN standalone library

Usage Example

Join our community

Citing CTGAN

Related Projects

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes