Skip to main content

Package for ensembling models together

Project description

Ensemblizer

ensemblizer is a small package for various ensemble methods

ModelCollection is a simple way to aggregate models so that they can be trained together.

CatEnsemble takes a ModelCollection object (or the same input as one) with an additional ensemble model. The ensemble model will train on the predicted probabilities (or just predictions) of the ModelCollection and make a separate prediction. This ensemble method allows for different weighting schemes of the outputs of each model as well as stacking the original data with the predictions when training the ensemble model. Additionally, this ensemble class allows you to pretrain the ModelCollection object prior to training the ensemble model or training the ModelCollection object with the ensemble model. This allows for more efficient hyperparameter tuning (albeit much longer training and tuning times).

Hyperparameters can be trained like any other scikit-learn-esque model. When setting parameters for the ensemble model, parameters that begin with name__ will set the hyperparameter of the name model in the collection or ensemble model (default name for ensemble model is "ensemble"). Parameters that start with __name will update the weight of the name model in the collection. This allows the weighting scheme of the ensemble to be tuned with all other parameters using any scikit-learn tuning package.

Current Version is v0.04

This package is currently in the beginning stages but future work is planned.

Installation

pip install ensemblizer

Usage

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from ensemblizer import ModelCollection, CatEnsemble

np.random.seed(0)
x_train = np.random.randint(0, 10, (80, 3))
x_test = np.random.randint(0, 10, (20, 3))
y_train = np.random.randint(0, 2, 80)
y_test = np.random.randint(0, 2, 20)

models = ModelCollection([('log', LogisticRegression(random_state=0)),('nb', MultinomialNB())])
test.fit(x_train, y_train)

ensemble = CatEnsemble(test, KNeighborsClassifier())
ensemble.fit(x_train, y_train)
test_preds = ensemble.predict(x_test)
print(f"Accuracy on test set is {accuracy_score(y_test, test_preds)}"

#change the C param of the 'log' model to 15, the alpha param of the 'nb' model to 1,
#the n_neighbors param of the ensemble model to 10, and the weight of the 'log' model to 3  
ens.set_params('log__C': 15, 'nb__alpha': 1, 'ensemble__n_neighbors': 10, '__log': 3})
ens.fit(x_train, y_train)
test_preds = ensemble.predict(x_test)
print(f"Accuracy on test set is {accuracy_score(y_test, test_preds)}"

Future Plans

The next step is to create an ensemble regressor model.

License

Lol

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ensemblizer-0.6.tar.gz (4.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page