Skip to main content

Reinforced Data Sampling for Model Diversification

Project description

RDS

Implementation of Reinforced Data Sampling for Model Diversification.

Requirements

  • numpy
  • torch
  • scikit-learn
  • pandas
  • tqdm

Machine Learning Tasks

This repository supports multiple machine learning tasks on multivariate, textual and visual data:

  • Binary Classification
  • Multi-Class Classification
  • Regression

Real-World Use Cases

Please contact us if you want to be listed here for real-world competitions or use cases.

Experiment Results

Experiments have been conducted on four datasets as the following.

Dataset Task Challenge Size of Data Evaluation Year
MADELON Binary Classification NIPS 2013 Feature Selection Challenge 2,600 x 500 (multivariate) AUC 2003
DR Regression Drug Reviews (Kaggle Hackathon) 215,063 x 6 (multivariate, text) R^2 2018
MNIST Multiclass Classification Hand Written Digit Recognition 70,000 x 28 x 28 (image) Micro-F1 1998
KLP Binary Classification Kalapa Credit Scoring Challenge 50,000 x 64 (multivariate, text) AUC 2020

MADELON - Results

Sampling #Sample Class Ratio LR RF MLP Ensemble Public
Train Test Train Test
Preset 2000 600 1.0000 1.0000 .6019 .8106 .5590 .6783 .9063
Random 2000 600 .9920 1.0270 .5742 .7729 .5774 .6453 .9002
Stratified 2000 600 1.0000 1.0000 .5673 .7470 .6153 .6360 .8828
RDS^{DET} 2001 599 1.0375 .9137 .6192 .8050 .6228 .6973 .8915
RDS^{STO} 2021 579 1.0010 .9966 .6192 .8050 .6050 .6947 .9106

DR - Results

Sampling Train Test Ridge MLP CNN Ensemble Public
Preset 161,297 53,766 .4580 .5787 .7282 .6660 .7637
Random 161,297 53,766 .4597 .4179 .7353 .6485 .7503
RDS^{DET} 162,070 52,993 .4646 .5776 .7355 .6692 .7649
RDS^{STO} 161,944 53,119 .4647 .5370 .7509 .6562 .7600

MNIST - Results

Sampling #Sample Class Ratio LR RF CNN Ensemble Public
Train Test Train Test
Preset 60000 10000 .8571 .1429 .9647 .9524 .9824 .9819 .9917
Random 59500 10500 .8500 .1500 .9603 .9465 .9779 .9768 .9914
Stratified 59500 10500 .8500 .1500 .9625 .9510 .9795 .9792 .9901
RDS^{DET} 59938 10062 .8562 .1438 .9495 .9382 .9757 .9769 .9927
RDS^{STO} 59496 10504 .8499 .1501 .9583 .9486 .9851 .9830 .9931

KLP - Results

Sampling #Sample Class Ratio LR RF MLP Ensemble Public
Train Test Train Test
Preset 30000 20000 .0165 .0186 .5799 .5517 .5635 .5723 .5953
Simple 30000 20000 .0169 .0179 .5886 .5374 .5914 .5856 .6042
Stratified 30000 20000 .0173 .0173 .5952 .5608 .5780 .5983 .6014
RDS^{DET} 29999 20001 .0180 .0163 .6045 .5350 .5802 .6057 .5362
RDS^{STO} 30031 19969 .0172 .0174 .5997 .5491 .6354 .6072 .6096

Demos

Madelon - Binary Classification

Binary Classification with Deterministic Ensemble

python rds.py --data datasets/madelon.csv --target 0 -id MDL_DET --learning deterministic --sampling-ratio 0.7695 --envs models.MDL_RF models.MDL_MLP models.MDL_LR

Binary Classification with Stochastic Choice

python rds.py --data datasets/madelon.csv --target 0 -id MDL_STO --learning stochastic --sampling-ratio 0.7695 --envs models.MDL_RF models.MDL_MLP models.MDL_LR

Evaluating with Public Benchmarking

python evaluator.py --data datasets/madelon.csv --target 0 --sample outputs/MDL_DET.npy --task classification --measure auc --envs models.MDL_PS
python evaluator.py --data datasets/madelon.csv --target 0 --sample outputs/MDL_STO.npy --task classification --measure auc --envs models.MDL_PS

Boston Housing - Regression

Regression with Deterministic Ensemble

python rds.py --data datasets/boston.csv --target 0 -id BOS_DET --task regression --learning deterministic --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM

Regression with Stochastic Choice

python rds.py --data datasets/boston.csv --target 0 -id BOS_STO --task regression --learning stochastic --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM

Evaluating with Ensemble Benchmarking

python evaluator.py --data datasets/boston.csv --target 0 --sample outputs/BOS_DET.npy --task regression --measure auc --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM
python evaluator.py --data datasets/boston.csv --target 0 --sample outputs/BOS_STO.npy --task regression --measure auc --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM

MNIST - Multi-class Classification

Regression with Deterministic Ensemble

python rds.py --data-loader datasets.MNIST -id MNIST_DET --task classification --learning deterministic --sampling-ratio 0.8572 --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR

Regression with Stochastic Choice

python rds.py --data-loader datasets.MNIST -id MNIST_STO --task classification --learning stochastic --sampling-ratio 0.8572 --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR

Evaluating with Ensemble Benchmarking

python evaluator.py --data-loader datasets.MNIST --sample outputs/MNIST_DET.npy --task classification --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR
python evaluator.py --data-loader datasets.MNIST --sample outputs/MNIST_STO.npy --task classification --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR

Citing this work

Please consider citing us if this work is useful in your research:

@misc{nguyen2020reinforced,
    title={Reinforced Data Sampling for Model Diversification},
    author={Hoang D. Nguyen and Xuan-Son Vu and Quoc-Tuan Truong and Duc-Trong Le},
    year={2020},
    eprint={2006.07100},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

References

  • Lee, S., Prakash, S.P.S., Cogswell, M., Ranjan, V., Crandall, D. and Batra, D., 2016. Stochastic multiple choice learning for training diverse deep ensembles. In Advances in Neural Information Processing Systems (pp. 2119-2127).
  • Peng, M., Zhang, Q., Xing, X., Gui, T., Huang, X., Jiang, Y.G., Ding, K. and Chen, Z., 2019, July. Trainable undersampling for class-imbalance learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 4707-4714).
  • Gong, Z., Zhong, P. and Hu, W., 2019. Diversity in machine learning. IEEE Access, 7, pp.64323-64350.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchRDS-0.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

torchRDS-0.2-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file torchRDS-0.2.tar.gz.

File metadata

  • Download URL: torchRDS-0.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchRDS-0.2.tar.gz
Algorithm Hash digest
SHA256 68c01c227be20cbdd77d03675b0c45e3215d6a3cdb017f47d223314217fe1d59
MD5 410d6b7f425c235d5d1272dd5ad42a86
BLAKE2b-256 6a95acd1f217cc144ac1cce58d99d6b1cde69923fcee87236f931d792c67955a

See more details on using hashes here.

File details

Details for the file torchRDS-0.2-py3-none-any.whl.

File metadata

  • Download URL: torchRDS-0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchRDS-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 32c275639fa1ffc5634fc2392f1ccc7c388960fde944401ef29339c7f8aaa94c
MD5 199498424658566bccc3bbe586a36304
BLAKE2b-256 79cc9ec119ff9877f9d83a4aa3e4668f130f5f3d6d976ffb01a10447fee0b443

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page