A Python Toolbox for Combination Tasks in Machine Learning
Project description
Deployment & Documentation & Stats
Build Status & Coverage & Maintainability & License
combo is a comprehensive Python model combination toolbox for fusing/aggregating/selecting multiple base ML estimators, under supervised, unsupervised, and semi-supervised scenarios. It consists methods for various tasks, including classification, clustering, anomaly detection, and raw score combination.
Model combination is an important task in ensemble learning, but is often beyond the scope of ensemble learning. For instance, simple averaging the results of the same classifiers with multiple runs is deemed as a good way to eliminate the randomness in the classifier for a better stability. Model combination has been widely used in data science competitions and real-world tasks, such as Kaggle. See figure below for some popular combination approaches.
combo is featured for:
Unified APIs, detailed documentation, and interactive examples across various algorithms.
Advanced models, including dynamic classifier/ensemble selection.
Comprehensive coverage for supervised, unsupervised, and semi-supervised scenarios.
Rich applications for classification, clustering, anomaly detection, and raw score combination.
Optimized performance with JIT and parallelization when possible, using numba and joblib.
Table of Contents:
Installation
It is recommended to use pip for installation. Please make sure the latest version is installed, as combo is updated frequently:
pip install combo # normal install
pip install --upgrade combo # or update if needed
pip install --pre combo # or include pre-release version for new features
Alternatively, you could clone and run setup.py file:
git clone https://github.com/yzhao062/combo.git
cd combo
pip install .
Required Dependencies:
Python 3.5, 3.6, or 3.7
joblib
matplotlib
numpy>=1.13
numba>=0.35
scipy>=0.19.1
scikit_learn>=0.19.1
Proposed Algorithms
combo will include various model combination frameworks by tasks:
Classifier combination: combine multiple supervised classifiers together for training and prediction
Raw score & probability combination: combine scores without invoking classifiers
Averaging & Weighted Averaging
Maximization
Average of Maximum (AOM)
Maximum of Average (MOA)
Cluster combination: combine unsupervised clustering results
Clusterer Ensemble [3]
Anomaly detection: combine unsupervised outlier detectors
Averaging & Weighted Averaging
Maximization
Average of Maximum (AOM)
Maximum of Average (MOA)
Thresholding
Locally Selective Combination (LSCP) [2]
Quick Start for Classifier Combination
“examples/classifier_comb_example.py” demonstrates the basic API of predicting with multiple classifiers. It is noted that the API across all other algorithms are consistent/similar.
Initialize a group of classifiers as base estimators
from combo.models.classifier_comb import SimpleClassifierAggregator # initialize a group of classifiers classifiers = [DecisionTreeClassifier(random_state=random_state), LogisticRegression(random_state=random_state), KNeighborsClassifier(), RandomForestClassifier(random_state=random_state), GradientBoostingClassifier(random_state=random_state)]
Initialize an aggregator class and pass in combination methods
# combine by averaging clf = SimpleClassifierAggregator(classifiers, method='average') clf.fit(X_train, y_train)
Predict by SimpleClassifierAggregator and then evaluate
y_test_predicted = clf.predict(X_test) evaluate_print('Combination by avg |', y_test, y_test_predicted)
See a sample output of classifier_comb_example.py
Decision Tree | Accuracy:0.9386, ROC:0.9383, F1:0.9521 Logistic Regression | Accuracy:0.9649, ROC:0.9615, F1:0.973 K Neighbors | Accuracy:0.9561, ROC:0.9519, F1:0.9662 Gradient Boosting | Accuracy:0.9605, ROC:0.9524, F1:0.9699 Random Forest | Accuracy:0.9605, ROC:0.961, F1:0.9693 Combination by avg | Accuracy:0.9693, ROC:0.9677, F1:0.9763 Combination by w_avg | Accuracy:0.9781, ROC:0.9716, F1:0.9833 Combination by max | Accuracy:0.9518, ROC:0.9312, F1:0.9642 Combination by w_vote| Accuracy:0.9649, ROC:0.9644, F1:0.9728
Development Status
combo is currently under development as of July 15, 2019. A concrete plan has been laid out and will be implemented in the next few months.
Similar to other libraries built by us, e.g., Python Outlier Detection Toolbox (pyod), combo is also targeted to be published in Journal of Machine Learning Research (JMLR), open-source software track.
Watch & Star to get the latest update! Also feel free to send me an email (zhaoy@cmu.edu) for suggestions and ideas.
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.