A Python Toolbox for Combination Tasks in Machine Learning
Deployment & Documentation & Stats
combo is a comprehensive Python model combination toolbox for fusing/aggregating/selecting multiple base ML estimators, under supervised, unsupervised, and semi-supervised scenarios. It consists methods for various tasks, including classification, clustering, anomaly detection, and raw score combination.
Model combination is an important task in ensemble learning, but is often beyond the scope of ensemble learning. For instance, simple averaging the results of the same classifiers with multiple runs is deemed as a good way to eliminate the randomness in the classifier for a better stability. Model combination has been widely used in data science competitions and real-world tasks, such as Kaggle. See figure below for some popular combination approaches.
combo is featured for:
- Unified APIs, detailed documentation, and interactive examples across various algorithms.
- Advanced models, including dynamic classifier/ensemble selection.
- Comprehensive coverage for supervised, unsupervised, and semi-supervised scenarios.
- Rich applications for classification, clustering, anomaly detection, and raw score combination.
- Optimized performance with JIT and parallelization when possible, using numba and joblib.
Table of Contents:
It is recommended to use pip for installation. Please make sure the latest version is installed, as combo is updated frequently:
pip install combo # normal install pip install --upgrade combo # or update if needed pip install --pre combo # or include pre-release version for new features
Alternatively, you could clone and run setup.py file:
git clone https://github.com/yzhao062/combo.git cd combo pip install .
- Python 3.5, 3.6, or 3.7
combo will include various model combination frameworks by tasks:
- Classifier combination: combine multiple supervised classifiers together for training and prediction
- Raw score & probability combination: combine scores without invoking classifiers
- Cluster combination: combine unsupervised clustering results * Clusterer Ensemble 
- Anomaly detection: combine unsupervised outlier detectors
For each of the tasks, various methods may be introduced:
- Simple methods: averaging, maximization, weighted averaging, thresholding
- Bucket methods: average of maximization, maximization of average
- Learning methods: stacking (build an additional classifier to learn base estimator weights)
- Selection methods: dynamic classifier/ensemble selection 
- Other models
Quick Start for Classifier Combination
“examples/classifier_comb_example.py” demonstrates the basic API of predicting with multiple classifiers. It is noted that the API across all other algorithms are consistent/similar.
Initialize a group of classifiers as base estimators
from combo.models.classifier_comb import BaseClassiferAggregator # initialize a group of classifiers classifiers = [DecisionTreeClassifier(random_state=random_state), LogisticRegression(random_state=random_state), KNeighborsClassifier(), RandomForestClassifier(random_state=random_state), GradientBoostingClassifier(random_state=random_state)]
Initialize an aggregator class and pass in initialized classifiers for training
# combine by averaging clf = BaseClassiferAggregator(classifiers) clf.fit(X_train, y_train)
Predict by averaging base classifier results and then evaluate
# combine by averaging y_test_predicted = clf.predict(X_test, method='average') evaluate_print('Combination by avg |', y_test, y_test_predicted)
Predict by maximizing base classifier results and then evaluate
# combine by maximization y_test_predicted = clf.predict(X_test, method='maximization') evaluate_print('Combination by max |', y_test, y_test_predicted)
See a sample output of classifier_comb_example.py
Decision Tree | Accuracy:0.9386, ROC:0.9383, F1:0.9521 Logistic Regression | Accuracy:0.9649, ROC:0.9615, F1:0.973 K Neighbors | Accuracy:0.9561, ROC:0.9519, F1:0.9662 Gradient Boosting | Accuracy:0.9605, ROC:0.9524, F1:0.9699 Random Forest | Accuracy:0.9605, ROC:0.961, F1:0.9693 Combination by avg | Accuracy:0.9693, ROC:0.9677, F1:0.9763 Combination by max | Accuracy:0.9518, ROC:0.9312, F1:0.9642
combo is currently under development as of July 15, 2019. A concrete plan has been laid out and will be implemented in the next few months.
Watch & Star to get the latest update! Also feel free to send me an email (firstname.lastname@example.org) for suggestions and ideas.