Reinforced Data Sampling for Model Diversification

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Software Development :: Build Tools

Project description

RDS

Implementation of Reinforced Data Sampling for Model Diversification.

Requirements

numpy
torch
scikit-learn
pandas
tqdm

Machine Learning Tasks

This repository supports multiple machine learning tasks on multivariate, textual and visual data:

Binary Classification
Multi-Class Classification
Regression

Installation

pip install torchRDS

Usage

from torchRDS.RDS import RDS

trainer = RDS(data_file="datasets/madelon.csv", target=[0], task="classification", measure="auc", 
              model_classes=["models.MDL_RF", "models.MDL_MLP", "models.MDL_LR"], 
              learn="deterministic", ratio=0.7695, iters=100)
sample = trainer.train()

print("No of observations in training set: ", sum(sample))

Real-World Use Cases

Please contact us if you want to be listed here for real-world competitions or use cases.

Experiment Results

Experiments have been conducted on four datasets as the following.

Dataset	Task	Challenge	Size of Data	Evaluation	Year
MADELON	Binary Classification	NIPS 2013 Feature Selection Challenge	2,600 x 500 (multivariate)	AUC	2003
DR	Regression	Drug Reviews (Kaggle Hackathon)	215,063 x 6 (multivariate, text)	R^2	2018
MNIST	Multiclass Classification	Hand Written Digit Recognition	70,000 x 28 x 28 (image)	Micro-F1	1998
KLP	Binary Classification	Kalapa Credit Scoring Challenge	50,000 x 64 (multivariate, text)	AUC	2020

MADELON - Results

Sampling	#Sample		Class Ratio		LR	RF	MLP	Ensemble	Public
	Train	Test	Train	Test
Preset	2000	600	1.0000	1.0000	.6019	.8106	.5590	.6783	.9063
Random	2000	600	.9920	1.0270	.5742	.7729	.5774	.6453	.9002
Stratified	2000	600	1.0000	1.0000	.5673	.7470	.6153	.6360	.8828
RDS^{DET}	2001	599	1.0375	.9137	.6192	.8050	.6228	.6973	.8915
RDS^{STO}	2021	579	1.0010	.9966	.6192	.8050	.6050	.6947	.9106

DR - Results

Sampling	Train	Test	Ridge	MLP	CNN	Ensemble	Public
Preset	161,297	53,766	.4580	.5787	.7282	.6660	.7637
Random	161,297	53,766	.4597	.4179	.7353	.6485	.7503
RDS^{DET}	162,070	52,993	.4646	.5776	.7355	.6692	.7649
RDS^{STO}	161,944	53,119	.4647	.5370	.7509	.6562	.7600

MNIST - Results

Sampling	#Sample		Class Ratio		LR	RF	CNN	Ensemble	Public
	Train	Test	Train	Test
Preset	60000	10000	.8571	.1429	.9647	.9524	.9824	.9819	.9917
Random	59500	10500	.8500	.1500	.9603	.9465	.9779	.9768	.9914
Stratified	59500	10500	.8500	.1500	.9625	.9510	.9795	.9792	.9901
RDS^{DET}	59938	10062	.8562	.1438	.9495	.9382	.9757	.9769	.9927
RDS^{STO}	59496	10504	.8499	.1501	.9583	.9486	.9851	.9830	.9931

KLP - Results

Sampling	#Sample		Class Ratio		LR	RF	MLP	Ensemble	Public
	Train	Test	Train	Test
Preset	30000	20000	.0165	.0186	.5799	.5517	.5635	.5723	.5953
Simple	30000	20000	.0169	.0179	.5886	.5374	.5914	.5856	.6042
Stratified	30000	20000	.0173	.0173	.5952	.5608	.5780	.5983	.6014
RDS^{DET}	29999	20001	.0180	.0163	.6045	.5350	.5802	.6057	.5362
RDS^{STO}	30031	19969	.0172	.0174	.5997	.5491	.6354	.6072	.6096

Citing this work

Please consider citing us if this work is useful in your research:

@misc{nguyen2020reinforced,
    title={Reinforced Data Sampling for Model Diversification},
    author={Hoang D. Nguyen and Xuan-Son Vu and Quoc-Tuan Truong and Duc-Trong Le},
    year={2020},
    eprint={2006.07100},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

References

Lee, S., Prakash, S.P.S., Cogswell, M., Ranjan, V., Crandall, D. and Batra, D., 2016. Stochastic multiple choice learning for training diverse deep ensembles. In Advances in Neural Information Processing Systems (pp. 2119-2127).
Peng, M., Zhang, Q., Xing, X., Gui, T., Huang, X., Jiang, Y.G., Ding, K. and Chen, Z., 2019, July. Trainable undersampling for class-imbalance learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 4707-4714).
Gong, Z., Zhong, P. and Hu, W., 2019. Diversity in machine learning. IEEE Access, 7, pp.64323-64350.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

0.3

Jun 22, 2020

0.2

Jun 22, 2020

0.1

Jun 22, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchRDS-0.3.tar.gz (10.2 kB view details)

Uploaded Jun 22, 2020 Source

Built Distribution

torchRDS-0.3-py3-none-any.whl (9.5 kB view details)

Uploaded Jun 22, 2020 Python 3

File details

Details for the file torchRDS-0.3.tar.gz.

File metadata

Download URL: torchRDS-0.3.tar.gz
Upload date: Jun 22, 2020
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchRDS-0.3.tar.gz
Algorithm	Hash digest
SHA256	`eb0815d947498195c993f5587b4f8fdc9a0348ca3c45b055879597ce461aa715`
MD5	`fd760a21a2d117f509350773aa200af6`
BLAKE2b-256	`f283e0fe4e220238c9284ebffbb3f8e4d6d04974258372847bddc212ddcad04d`

See more details on using hashes here.

File details

Details for the file torchRDS-0.3-py3-none-any.whl.

File metadata

Download URL: torchRDS-0.3-py3-none-any.whl
Upload date: Jun 22, 2020
Size: 9.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for torchRDS-0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f60071e533936c9bbc1fd192968e6e518d673c8a0fe6c34f1c36a952d1cc3a1a`
MD5	`84c118302a1921b52f3083af64af4908`
BLAKE2b-256	`caa3a6695462f5b2590ceab1365dd709c8f5def3708cd22c0c5671dbd27fc9d4`