Reinforced Data Sampling for Model Diversification
Project description
RDS
Implementation of Reinforced Data Sampling for Model Diversification.
Requirements
- numpy
- torch
- scikit-learn
- pandas
- tqdm
Machine Learning Tasks
This repository supports multiple machine learning tasks on multivariate, textual and visual data:
- Binary Classification
- Multi-Class Classification
- Regression
Real-World Use Cases
Please contact us if you want to be listed here for real-world competitions or use cases.
Experiment Results
Experiments have been conducted on four datasets as the following.
Dataset | Task | Challenge | Size of Data | Evaluation | Year |
---|---|---|---|---|---|
MADELON | Binary Classification | NIPS 2013 Feature Selection Challenge | 2,600 x 500 (multivariate) | AUC | 2003 |
DR | Regression | Drug Reviews (Kaggle Hackathon) | 215,063 x 6 (multivariate, text) | R^2 | 2018 |
MNIST | Multiclass Classification | Hand Written Digit Recognition | 70,000 x 28 x 28 (image) | Micro-F1 | 1998 |
KLP | Binary Classification | Kalapa Credit Scoring Challenge | 50,000 x 64 (multivariate, text) | AUC | 2020 |
MADELON - Results
Sampling | #Sample | Class Ratio | LR | RF | MLP | Ensemble | Public | ||
---|---|---|---|---|---|---|---|---|---|
Train | Test | Train | Test | ||||||
Preset | 2000 | 600 | 1.0000 | 1.0000 | .6019 | .8106 | .5590 | .6783 | .9063 |
Random | 2000 | 600 | .9920 | 1.0270 | .5742 | .7729 | .5774 | .6453 | .9002 |
Stratified | 2000 | 600 | 1.0000 | 1.0000 | .5673 | .7470 | .6153 | .6360 | .8828 |
RDS^{DET} | 2001 | 599 | 1.0375 | .9137 | .6192 | .8050 | .6228 | .6973 | .8915 |
RDS^{STO} | 2021 | 579 | 1.0010 | .9966 | .6192 | .8050 | .6050 | .6947 | .9106 |
DR - Results
Sampling | Train | Test | Ridge | MLP | CNN | Ensemble | Public |
---|---|---|---|---|---|---|---|
Preset | 161,297 | 53,766 | .4580 | .5787 | .7282 | .6660 | .7637 |
Random | 161,297 | 53,766 | .4597 | .4179 | .7353 | .6485 | .7503 |
RDS^{DET} | 162,070 | 52,993 | .4646 | .5776 | .7355 | .6692 | .7649 |
RDS^{STO} | 161,944 | 53,119 | .4647 | .5370 | .7509 | .6562 | .7600 |
MNIST - Results
Sampling | #Sample | Class Ratio | LR | RF | CNN | Ensemble | Public | ||
---|---|---|---|---|---|---|---|---|---|
Train | Test | Train | Test | ||||||
Preset | 60000 | 10000 | .8571 | .1429 | .9647 | .9524 | .9824 | .9819 | .9917 |
Random | 59500 | 10500 | .8500 | .1500 | .9603 | .9465 | .9779 | .9768 | .9914 |
Stratified | 59500 | 10500 | .8500 | .1500 | .9625 | .9510 | .9795 | .9792 | .9901 |
RDS^{DET} | 59938 | 10062 | .8562 | .1438 | .9495 | .9382 | .9757 | .9769 | .9927 |
RDS^{STO} | 59496 | 10504 | .8499 | .1501 | .9583 | .9486 | .9851 | .9830 | .9931 |
KLP - Results
Sampling | #Sample | Class Ratio | LR | RF | MLP | Ensemble | Public | ||
---|---|---|---|---|---|---|---|---|---|
Train | Test | Train | Test | ||||||
Preset | 30000 | 20000 | .0165 | .0186 | .5799 | .5517 | .5635 | .5723 | .5953 |
Simple | 30000 | 20000 | .0169 | .0179 | .5886 | .5374 | .5914 | .5856 | .6042 |
Stratified | 30000 | 20000 | .0173 | .0173 | .5952 | .5608 | .5780 | .5983 | .6014 |
RDS^{DET} | 29999 | 20001 | .0180 | .0163 | .6045 | .5350 | .5802 | .6057 | .5362 |
RDS^{STO} | 30031 | 19969 | .0172 | .0174 | .5997 | .5491 | .6354 | .6072 | .6096 |
Demos
Madelon - Binary Classification
Binary Classification with Deterministic Ensemble
python rds.py --data datasets/madelon.csv --target 0 -id MDL_DET --learning deterministic --sampling-ratio 0.7695 --envs models.MDL_RF models.MDL_MLP models.MDL_LR
Binary Classification with Stochastic Choice
python rds.py --data datasets/madelon.csv --target 0 -id MDL_STO --learning stochastic --sampling-ratio 0.7695 --envs models.MDL_RF models.MDL_MLP models.MDL_LR
Evaluating with Public Benchmarking
python evaluator.py --data datasets/madelon.csv --target 0 --sample outputs/MDL_DET.npy --task classification --measure auc --envs models.MDL_PS
python evaluator.py --data datasets/madelon.csv --target 0 --sample outputs/MDL_STO.npy --task classification --measure auc --envs models.MDL_PS
Boston Housing - Regression
Regression with Deterministic Ensemble
python rds.py --data datasets/boston.csv --target 0 -id BOS_DET --task regression --learning deterministic --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM
Regression with Stochastic Choice
python rds.py --data datasets/boston.csv --target 0 -id BOS_STO --task regression --learning stochastic --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM
Evaluating with Ensemble Benchmarking
python evaluator.py --data datasets/boston.csv --target 0 --sample outputs/BOS_DET.npy --task regression --measure auc --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM
python evaluator.py --data datasets/boston.csv --target 0 --sample outputs/BOS_STO.npy --task regression --measure auc --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM
MNIST - Multi-class Classification
Regression with Deterministic Ensemble
python rds.py --data-loader datasets.MNIST -id MNIST_DET --task classification --learning deterministic --sampling-ratio 0.8572 --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR
Regression with Stochastic Choice
python rds.py --data-loader datasets.MNIST -id MNIST_STO --task classification --learning stochastic --sampling-ratio 0.8572 --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR
Evaluating with Ensemble Benchmarking
python evaluator.py --data-loader datasets.MNIST --sample outputs/MNIST_DET.npy --task classification --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR
python evaluator.py --data-loader datasets.MNIST --sample outputs/MNIST_STO.npy --task classification --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR
Citing this work
Please consider citing us if this work is useful in your research:
@misc{nguyen2020reinforced,
title={Reinforced Data Sampling for Model Diversification},
author={Hoang D. Nguyen and Xuan-Son Vu and Quoc-Tuan Truong and Duc-Trong Le},
year={2020},
eprint={2006.07100},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
References
- Lee, S., Prakash, S.P.S., Cogswell, M., Ranjan, V., Crandall, D. and Batra, D., 2016. Stochastic multiple choice learning for training diverse deep ensembles. In Advances in Neural Information Processing Systems (pp. 2119-2127).
- Peng, M., Zhang, Q., Xing, X., Gui, T., Huang, X., Jiang, Y.G., Ding, K. and Chen, Z., 2019, July. Trainable undersampling for class-imbalance learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 4707-4714).
- Gong, Z., Zhong, P. and Hu, W., 2019. Diversity in machine learning. IEEE Access, 7, pp.64323-64350.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torchRDS-0.2.tar.gz
.
File metadata
- Download URL: torchRDS-0.2.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68c01c227be20cbdd77d03675b0c45e3215d6a3cdb017f47d223314217fe1d59 |
|
MD5 | 410d6b7f425c235d5d1272dd5ad42a86 |
|
BLAKE2b-256 | 6a95acd1f217cc144ac1cce58d99d6b1cde69923fcee87236f931d792c67955a |
File details
Details for the file torchRDS-0.2-py3-none-any.whl
.
File metadata
- Download URL: torchRDS-0.2-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32c275639fa1ffc5634fc2392f1ccc7c388960fde944401ef29339c7f8aaa94c |
|
MD5 | 199498424658566bccc3bbe586a36304 |
|
BLAKE2b-256 | 79cc9ec119ff9877f9d83a4aa3e4668f130f5f3d6d976ffb01a10447fee0b443 |