Skip to main content

Handy machine learning tools in the spirit of scikit-learn.

Project description

extrakit-learn

PyPI version License

Machine learnings components built to extend scikit-learn. All components use scikit's object API to work interchangably with scikit components. It is mostly a collection of tools that have been useful for Kaggle competitions. extrakit-learn is in no way affiliated with scikit-learn in anyway, just inspired by it.

Installation

pip install xklearn

Components

  • TargetEncoder - Categorical feature engineering based on target means.
  • CountEncoder - Categorical feature engineering based on value counts.
  • FoldEstimator - K-fold cross validation meta estimator.
  • FoldLGBM - K-fold cross validation meta LGBM.
  • StackingClassifier - Stack an ensemble of classifiers with a meta classifier.
  • StackingRegressor - Stack an ensemble of regressors with a meta regressor.

Hierachy

xklearn
├── preprocessing
│   ├── CountEncoder      
│   └── TargetEncoder    
└── models
    ├── FoldEstimator
    ├── FoldLGBM
    ├── StackingClassifier
    └── StackingRegressor
Example
from xklearn.models import FoldEstimator

TargetEncoder

Performs target mean encoding of categorical features with optional smoothing.

Arguments

smoothing - Smoothing weight.

unseen - Strategy for handling unseen values. Se replacement strategies below for options.

missing - Strategy for handling missing values. Se replacement strategies below for options.

Replacement strategies

'one' - Replace value with 1.

'nan' - Replace value with np.nan.

'error' - Raise ValueError.

Example:

te = TargetEncoder(smoothing=10)
X[0] = te.fit_transform(X[0], y)

CountEncoder

Replaces categorical values with their respective value count during training. Classes with a count of one and previously unseen classes during prediction are encoded as either one or nan.

Arguments

unseen - Strategy for handling unseen values. Se replacement strategies below for options.

missing - Strategy for handling missing values. Se replacement strategies below for options.

Replacement strategies

'one' - Replace value with 1.

'nan' - Replace value with np.nan.

'error' - Raise ValueError.

Example:

ce = TargetEncoder(one_to_nan=True)
X[0] = ce.fit_transform(X[0], y)

FoldEstimator

Meta estimator that performs cross validation over k folds. Can optionally be used as a stacked ensemble of k estimators.

Arguments

est - Base estimator.

fold - Folding cross validation object, i.e KFold and StratifedKfold.

metric - Evaluation metric.

ensemble - Flag indicting post fit behaviour. True will make it a stacked ensemble, False will do a full refit on the full data.

verbose - Flag for printing intermediate scores during fit.

Example:

base = RandomForestRegressor(n_estimators=10)
fold = KFold(n_splits=5)

est = FoldEstimator(base, fold=fold, metric=mean_squared_error, verbose=1)

est.fit(X_train, y_train)
est.predict(X_test)

FoldLGBM

Meta estimator that performs cross validation over k folds on a LightGBM estimator. Can optionally be used as a ensemble of k estimators.

Arguments

lgbm - Base estimator.

fold - Folding cross validation object, i.e KFold and StratifedKfold.

metric - Evaluation metric.

fit_params - Dictionary of parameter that should be fed to the fit method.

ensemble - Flag indicting post fit behaviour. True will make it a stacked ensemble, False will do a full refit on the full data.

refit_params - Dictionary of parameter that should be fed to the refit if ensemble=False.

verbose - Flag for printing intermediate scores during fit.

Example:

base = LGBMClassifier(n_estimators=1000)
fold = KFold(n_splits=5)
fit_params = {'eval_metric': 'auc',
              'early_stopping_rounds': 50,
              'verbose': 0}

fold_lgbm = FoldLGBM(base, 
                     fold=fold, 
                     metric=roc_auc_score,
                     fit_params=fit_params,
                     verbose=1)

fold_lgbm.fit(X_train, y_train)
fold_lgbm.predict(X_test)

StackingClassifier

Ensemble classifier that stacks an ensemble of classifiers by using their outputs as input features.

Arguments

clfs - List of ensemble of classifiers.

meta_clf - Meta classifier that stacks the predictions of the ensemble.

keep_features - Flag to train the meta classifier on the original features too.

refit - Flag to retrain the ensemble of classifiers.

Example:

meta_clf = RidgeClassifier()
ensemble = [RandomForestClassifier(), KNeighborsClassifier(), SVC()]

stack_clf = StackingClassifier(clfs=ensemble, meta_clf=meta_clf, refit=True)

stack_clf.fit(X_train, y_train)
y_ = stack_clf.predict(X_test)

StackingRegressor

Ensemble regressor that stacks an ensemble of regressors by using their outputs as input features.

Arguments

regs - List of ensemble of regressors.

meta_reg - Meta regressor that stacks the predictions of the ensemble.

keep_features - Flag to train the meta regressor on the original features too.

refit - Flag to retrain the ensemble of regressors.

Example:

meta_reg = RidgeRegressor()
ensemble = [RandomForestRegressor(), KNeighborsRegressor(), SVR()]

stack_reg = StackingRegressor(regs=ensemble, meta_reg=meta_reg, refit=True)

stack_reg.fit(X_train, y_train)
y_ = stack_reg.predict(X_test)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xklearn-0.0.1.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

xklearn-0.0.1-py3-none-any.whl (14.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page