A synthetic tabular data generation library.
Project description
GenTab
Synthetic Tabular Data Generation Library
Overview
This Python library specializes in the generation of synthetic tabular data. It has a diverse range of statistical, Machine Learning (ML) and Deep Learning (DL) methods to accurately capture patterns in real datasets and replicate them in a synthetic context. It has multiple applications including pre-processing of tabular datasets, data balancing, resampling...
Features
:nut_and_bolt: Pre-process your data.
:clock130: State-of-the-art models.
:recycle: Easy to use and customize.
Install
The gentab library is available using pip. We recommend using a virtual environment to avoid conflicts with other software on your machine.
pip install gentab
Available Generators
Below is the list of the generators currently available in the library.
Linear
| Model | Example | Paper |
|---|---|---|
| Random Over-Sampling | link | |
| SMOTE | link | |
| ADASYN | link |
| Model | Example | Paper |
|---|---|---|
| Gaussian Copula | link |
AE
| Model | Example | Paper |
|---|---|---|
| TVAE | link |
GAN
| Model | Example | Paper |
|---|---|---|
| CTGAN | link | |
| CTAB-GAN | link | |
| CTAB-GAN+ | link |
Diffusion
| Model | Example | Paper |
|---|---|---|
| ForestDiffusion | link |
LLM
| Model | Example | Paper |
|---|---|---|
| GReaT | link | |
| Tabula | link |
Hybrid
| Model | Example | Papers |
|---|---|---|
| Copula GAN | link link | |
| AutoDiffusion | link |
Examples
Generation
from gentab.generators import AutoDiffusion
from gentab.evaluators import MLP
from gentab.data import Config, Dataset
from gentab.utils import console
config = Config("configs/playnet.json")
dataset = Dataset(config)
dataset.reduce_size(
{
"left_attack": 0.97,
"right_attack": 0.97,
"right_transition": 0.9,
"left_transition": 0.9,
"time_out": 0.8,
"left_penal": 0.5,
"right_penal": 0.5,
}
)
dataset.merge_classes(
{
"attack": ["left_attack", "right_attack"],
"transition": ["left_transition", "right_transition"],
"penalty": ["left_penal", "right_penal"],
}
)
dataset.reduce_mem()
console.print(dataset.class_counts(), dataset.row_count())
generator = AutoDiffusion(dataset)
generator.generate()
console.print(dataset.generated_class_counts(), dataset.generated_row_count())
evaluator = MLP(generator)
evaluator.evaluate()
dataset.save_to_disk(generator)
Tuning
from gentab.generators import AutoDiffusion
from gentab.evaluators import LightGBM
from gentab.tuners import AutoDiffusionTuner
from gentab.data import Config, Dataset
config = Config("configs/adult.json")
dataset = Dataset(config)
dataset.merge_classes({
"<=50K": ["<=50K."], ">50K": [">50K."]
})
dataset.reduce_mem()
generator = AutoDiffusion(dataset)
evaluator = LightGBM(generator)
trials = 10
time = 60 * 60 * 8
tuner = AutoDiffusionTuner(evaluator, trials, timeout=time)
tuner.tune()
tuner.save_to_disk()
Loading Stored Synthetic Datasets
from gentab.generators import AutoDiffusion
from gentab.evaluators import LightGBM
from gentab.tuners import AutoDiffusionTuner
from gentab.data import Config, Dataset
config = Config("configs/adult.json")
dataset = Dataset(config)
dataset.merge_classes({
"<=50K": ["<=50K."], ">50K": [">50K."]
})
dataset.reduce_mem()
# Load previously saved dataset...
generator = AutoDiffusion(dataset)
generator.load_from_disk()
# Do work with previously generated but not tuned dataset...
evaluator = LightGBM(generator)
evaluator.evaluate()
evaluator.evaluate_baseline()
# Load previously tuned and saved dataset...
tuner = AutoDiffusionTuner(evaluator, 0)
tuner.load_from_disk()
# Do work with previously tuned dataset...
evaluator.evaluate()
evaluator.evaluate_baseline()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gentab-0.1.2.tar.gz.
File metadata
- Download URL: gentab-0.1.2.tar.gz
- Upload date:
- Size: 105.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7036bc206158455bf8fec342b6f5fc4a0fdd032407bed736a8f65189f6cb127b
|
|
| MD5 |
a2595f4506057819ab423756a6db3e1c
|
|
| BLAKE2b-256 |
6d56c39995680f2aef83dd4b69cafa9e0c2739d9afbcaa74cb8666f892572b1d
|
File details
Details for the file gentab-0.1.2-py3-none-any.whl.
File metadata
- Download URL: gentab-0.1.2-py3-none-any.whl
- Upload date:
- Size: 133.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e0da06342304f20e469b685f4ce1a3d870dbeae99c6e26d4a4348aaeb044999
|
|
| MD5 |
ff567a6ec347638894b0ed9999876156
|
|
| BLAKE2b-256 |
946d43666d9314ea9950a73fe7d0f83890921c15f6d59ce5b3c77c120d7e404e
|