Skip to main content

Reinforced Data Sampling for Model Diversification

Project description

RDS

Implementation of Reinforced Data Sampling for Model Diversification.

Requirements

  • numpy
  • torch
  • scikit-learn
  • pandas
  • tqdm

Machine Learning Tasks

This repository supports multiple machine learning tasks on multivariate, textual and visual data:

  • Binary Classification
  • Multi-Class Classification
  • Regression

Installation

pip install torchRDS

Usage

from torchRDS.RDS import RDS

trainer = RDS(data_file="datasets/madelon.csv", target=[0], task="classification", measure="auc", 
              model_classes=["models.MDL_RF", "models.MDL_MLP", "models.MDL_LR"], 
              learn="deterministic", ratio=0.7695, iters=100)
sample = trainer.train()

print("No of observations in training set: ", sum(sample))

Real-World Use Cases

Please contact us if you want to be listed here for real-world competitions or use cases.

Experiment Results

Experiments have been conducted on four datasets as the following.

Dataset Task Challenge Size of Data Evaluation Year
MADELON Binary Classification NIPS 2013 Feature Selection Challenge 2,600 x 500 (multivariate) AUC 2003
DR Regression Drug Reviews (Kaggle Hackathon) 215,063 x 6 (multivariate, text) R^2 2018
MNIST Multiclass Classification Hand Written Digit Recognition 70,000 x 28 x 28 (image) Micro-F1 1998
KLP Binary Classification Kalapa Credit Scoring Challenge 50,000 x 64 (multivariate, text) AUC 2020

MADELON - Results

Sampling #Sample Class Ratio LR RF MLP Ensemble Public
Train Test Train Test
Preset 2000 600 1.0000 1.0000 .6019 .8106 .5590 .6783 .9063
Random 2000 600 .9920 1.0270 .5742 .7729 .5774 .6453 .9002
Stratified 2000 600 1.0000 1.0000 .5673 .7470 .6153 .6360 .8828
RDS^{DET} 2001 599 1.0375 .9137 .6192 .8050 .6228 .6973 .8915
RDS^{STO} 2021 579 1.0010 .9966 .6192 .8050 .6050 .6947 .9106

DR - Results

Sampling Train Test Ridge MLP CNN Ensemble Public
Preset 161,297 53,766 .4580 .5787 .7282 .6660 .7637
Random 161,297 53,766 .4597 .4179 .7353 .6485 .7503
RDS^{DET} 162,070 52,993 .4646 .5776 .7355 .6692 .7649
RDS^{STO} 161,944 53,119 .4647 .5370 .7509 .6562 .7600

MNIST - Results

Sampling #Sample Class Ratio LR RF CNN Ensemble Public
Train Test Train Test
Preset 60000 10000 .8571 .1429 .9647 .9524 .9824 .9819 .9917
Random 59500 10500 .8500 .1500 .9603 .9465 .9779 .9768 .9914
Stratified 59500 10500 .8500 .1500 .9625 .9510 .9795 .9792 .9901
RDS^{DET} 59938 10062 .8562 .1438 .9495 .9382 .9757 .9769 .9927
RDS^{STO} 59496 10504 .8499 .1501 .9583 .9486 .9851 .9830 .9931

KLP - Results

Sampling #Sample Class Ratio LR RF MLP Ensemble Public
Train Test Train Test
Preset 30000 20000 .0165 .0186 .5799 .5517 .5635 .5723 .5953
Simple 30000 20000 .0169 .0179 .5886 .5374 .5914 .5856 .6042
Stratified 30000 20000 .0173 .0173 .5952 .5608 .5780 .5983 .6014
RDS^{DET} 29999 20001 .0180 .0163 .6045 .5350 .5802 .6057 .5362
RDS^{STO} 30031 19969 .0172 .0174 .5997 .5491 .6354 .6072 .6096

Citing this work

Please consider citing us if this work is useful in your research:

@misc{nguyen2020reinforced,
    title={Reinforced Data Sampling for Model Diversification},
    author={Hoang D. Nguyen and Xuan-Son Vu and Quoc-Tuan Truong and Duc-Trong Le},
    year={2020},
    eprint={2006.07100},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

References

  • Lee, S., Prakash, S.P.S., Cogswell, M., Ranjan, V., Crandall, D. and Batra, D., 2016. Stochastic multiple choice learning for training diverse deep ensembles. In Advances in Neural Information Processing Systems (pp. 2119-2127).
  • Peng, M., Zhang, Q., Xing, X., Gui, T., Huang, X., Jiang, Y.G., Ding, K. and Chen, Z., 2019, July. Trainable undersampling for class-imbalance learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 4707-4714).
  • Gong, Z., Zhong, P. and Hu, W., 2019. Diversity in machine learning. IEEE Access, 7, pp.64323-64350.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchRDS-0.3.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

torchRDS-0.3-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file torchRDS-0.3.tar.gz.

File metadata

  • Download URL: torchRDS-0.3.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchRDS-0.3.tar.gz
Algorithm Hash digest
SHA256 eb0815d947498195c993f5587b4f8fdc9a0348ca3c45b055879597ce461aa715
MD5 fd760a21a2d117f509350773aa200af6
BLAKE2b-256 f283e0fe4e220238c9284ebffbb3f8e4d6d04974258372847bddc212ddcad04d

See more details on using hashes here.

File details

Details for the file torchRDS-0.3-py3-none-any.whl.

File metadata

  • Download URL: torchRDS-0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchRDS-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f60071e533936c9bbc1fd192968e6e518d673c8a0fe6c34f1c36a952d1cc3a1a
MD5 84c118302a1921b52f3083af64af4908
BLAKE2b-256 caa3a6695462f5b2590ceab1365dd709c8f5def3708cd22c0c5671dbd27fc9d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page