Implementing easy-to-use methods for classical and novel tabular data augmentation and synthesis.
Project description
Description
tabular_augmentation
contains some classical and novel methods used for data augmentation, making tabular data
augmentation easier, especially for few-shot learning case.
Usage
SMOTE-based methods
from tabular_augmentation import smote_augmentation
method = 'SVMSMOTE'
x_synthesis, y_synthesis = smote_augmentation(x_few_train, y_few_train, method, seed=seed,
oversample_num=100, positive_ratio=None,
knn_neighbors=3)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')
Mixup-base methods
from tabular_augmentation import mixup_augmentation_with_weight
method = 'vanilla'
x_synthesis, y_synthesis, sample_weight = mixup_augmentation_with_weight(
x_few_train, y_few_train, oversample_num=200, alpha=1, beta=1, mixup_type=method, seed=seed, rebalanced_ita=1)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb', sample_weight=sample_weight)
CTGAN/TVAE-based methods
Methods(CTGAN/TVAE/DeltaTVAE/DiffTVAE) use sdv_synthesis
function to generate synthetic data, and ConditionalTVAE use sdv_synthesis_cvae
function
from tabular_augmentation import sdv_synthesis, sdv_synthesis_cvae
method = 'CTGAN'
x_synthesis, y_synthesis = sdv_synthesis(
x_few_train, y_few_train, method, oversample_num=5000,
seed=seed, init_synthesizer=True, positive_ratio=0.5,
)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')
TabDDPM-based methods
from tabular_augmentation import ddpm_synthesis
method = "DDPM"
x_synthesis, y_synthesis = ddpm_synthesis(
x_few_train, y_few_train, method, oversample_num=5000, seed=seed, init_synthesizer=True, positive_ratio=None, train_steps=10000)
tabular_model_test(x_synthesis, y_synthesis, x_test, y_test, model_name='xgb')
Example
For details, please refer to example.ipynb
Cite
SMOTE
MIXUP
[ICLR' 18]mixup: BEYOND EMPIRICAL RISK MINIMIZATION Mixup
[ICLR' 22]Noisy Feature Mixup NoisyMixup
[ECCV' 20]Remix: Rebalanced Mixup
CTGAN/TVAE
[NIPS' 19]Modeling Tabular data using Conditional GAN CTGAN
TabDDPM
[ICML' 23] TabDDPM: Modelling Tabular Data with Diffusion Models TabDDPM
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tabular_augmentation-0.0.18.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe443b4c481bc5c12199783e10d580994e22e4ee57ebfe536dcd428ec13904f6 |
|
MD5 | 4f05794e935242824e00e6b9c1683397 |
|
BLAKE2b-256 | 3b00e2293b84278e2b905621e230ec2471819fa9b2e3a49407c1c534d824625d |
Hashes for tabular_augmentation-0.0.18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d066325d585cd67aac99e7fa6a842253ef338f0ac5c2cebda19c0e4d248add7e |
|
MD5 | 75535700e3261ffa8f5c5e87da2e5d49 |
|
BLAKE2b-256 | c644e0c5f8f0412831ad53edf6393e231c58fd33c793592fb189b09d2b6ee3b1 |