Open source machine learning library for performance of a weighted average over stacked predictions

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

waveml

Open source machine learning library for performance of a weighted average and linear transformations over stacked predictions

Pip

pip install waveml

Usage Example:

import numpy as np
from sklearn.datasets import load_boston
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor, ExtraTreesRegressor
from vecstack import StackingTransformer
from sklearn.metrics import mean_squared_error
from waveml import WaveRegressor, WaveTransformer
from waveml.metrics import SAE

Loss function

def rmse(predictions, targets):
    return np.sqrt(((predictions - targets) ** 2).mean())

Stacking ensemble

stack = StackingTransformer(
    estimators=[
        ["GBR", GradientBoostingRegressor()],
        ["RFR", RandomForestRegressor()],
        ["ETR", ExtraTreesRegressor()]
    ],
    n_folds=5,
    shuffle=True,
    random_state=42,
    metric=rmse,
    variant="A",
    verbose=0
)

Data

X, y = load_boston(return_X_y=True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42)

Training a stacking ensemble

stack.fit(X_train, y_train)
print("Individual scores:", np.mean(stack.scores_, axis=1))

Output:

Individual scores: [3.54600289 3.7031519  3.31942812]

Stacked predictions

SX_train = stack.transform(X_train)
SX_test = stack.transform(X_test)

LinearRegression

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(SX_train, y_train)
print("LinearRegression:", rmse(y_test, lr.predict(SX_test)))

Output

LinearRegression: 3.064970532826568

What is WaveRegressor?

WaveRegressor is a model that performs a weighted average over stacked predictions

wr = WaveRegressor(verbose=0, n_opt_rounds=1000, loss_function=SAE)
wr.fit(SX_train, y_train)
print("WaveRegressor:", rmse(y_test, wr.predict(SX_test)))

Output:

WaveRegressor: 3.026784272554217

Why is it better than Linear Regression?

The three main differance between WaveRegressor and linear regression:

WaveRegressor does not fit an intercept. Only coefficients

It can optimize several metrics that are present in metrics.py

To achieve a higher performce you should experiment with a loss_function parameter

What is WaveTransformer?

WaveTransformer is a model that performs linear transformations on each feature in a way that minimizes an error betbeen a feature and a target value
WaveTransformer does a cross validation process therefore it does not overfit and can be used to transform training data

Why to combine the two?

Combining the two models increases prediction quality

Combining example

Tune stacked predictions

wt = WaveTransformer(verbose=0, n_opt_rounds=1000, learning_rate=0.0001, loss_function=SAE)
wt.fit(SX_train, y_train, n_folds=5)
TSX_train = wt.transform(SX_train)
TSX_test = wt.transform(SX_test)

Perform weighted average over transformed stacked predictions

wr.fit(TSX_train, y_train)
print("WaveTransformer + WaveRegressor:", rmse(y_test, wr.predict(SX_test)))

Output:

WaveTransformer + WaveRegressor: 3.0190282172825995

TODO: categorical transformer

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.1

Apr 12, 2021

0.2.0

Apr 12, 2021

This version

0.1.11

Mar 4, 2021

0.1.10

Feb 27, 2021

0.1.7

Jan 27, 2021

0.1.5

Jan 19, 2021

0.1.3

Jan 19, 2021

0.1.2

Jan 19, 2021

0.1.1

Jan 19, 2021

0.1

Jan 19, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waveml-0.1.11.tar.gz (8.2 kB view hashes)

Uploaded Mar 4, 2021 Source

Hashes for waveml-0.1.11.tar.gz

Hashes for waveml-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`4304a83206984c0c086e4c9e9039d98758e0d01e0eafbbfdd1e458327ce1137f`
MD5	`c116259afcd029770eeca8c0cb4c71aa`
BLAKE2b-256	`459eee3c645437889c45b2dc0d49457ec2872277ba16a6c45b871de6f49ae473`