ForestDiffusion

Generating and Imputing Tabular Data via Diffusion and Flow XGBoost Models

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Education
Operating System
- MacOS :: MacOS X
- Microsoft :: Windows
Programming Language
- Python :: 2
- Python :: 3

Project description

Tabular data is hard to acquire and is subject to missing values. This paper proposes a novel approach to generate and impute mixed-type (continuous and categorical) tabular data using score-based diffusion and conditional flow matching. Contrary to previous work that relies on neural networks as function approximators, we instead utilize XGBoost, a popular Gradient-Boosted Tree (GBT) method. In addition to being elegant, we empirically show on various datasets that our method i) generates highly realistic synthetic data when the training dataset is either clean or tainted by missing data and ii) generates diverse plausible data imputations. Our method often outperforms deep-learning generation methods and can trained in parallel using CPUs without the need for a GPU. To make it easily accessible, we release our code through a Python library and an R package <arXiv:2309.09968>.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Education
Operating System
- MacOS :: MacOS X
- Microsoft :: Windows
Programming Language
- Python :: 2
- Python :: 3

Release history Release notifications | RSS feed

1.0.6

Jun 6, 2024

1.0.5

Dec 15, 2023

1.0.4

Oct 3, 2023

This version

1.0.3

Sep 28, 2023

1.0.2

Sep 28, 2023

1.0.1

Sep 19, 2023

1.0.0

Sep 19, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ForestDiffusion-1.0.3-py3-none-any.whl (11.2 kB view details)

Uploaded Sep 28, 2023 Python 3

File details

Details for the file ForestDiffusion-1.0.3-py3-none-any.whl.

File metadata

Download URL: ForestDiffusion-1.0.3-py3-none-any.whl
Upload date: Sep 28, 2023
Size: 11.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.2

File hashes

Hashes for ForestDiffusion-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5695ca196d12939d81aabc62939b85952e92f1b4b0de43063ac80b31ee3cbe55`
MD5	`e088b4b8a5983bf5bba2f2b2f862c7fc`
BLAKE2b-256	`5e30500a6bbed22173b464bd87ecbc3cb4298ca21d9dee3325374f370e904bff`