Skip to main content

Relational features, adaptive search, and confidence evaluation tools for scikit-learn

Project description

skboost

skboost is a lightweight Python library designed to boost your models — whether trees, linear models, neural networks, or anything else. It provides relational feature engineering, adaptive hyperparameter search, and confidence-aware evaluation tools that enhance model performance and interpretability.

Installation

pip install skboost

Features

Preprocessing

RelationalFeaturesTransformer - Boost model performance with compact relational features

from skboost.preprocessing import RelationalFeaturesTransformer

transformer = RelationalFeaturesTransformer(direction='larger')
X_transformed = transformer.fit_transform(X)
# For each feature, finds indices and distances to next larger/smaller values
# Adds O(N) features instead of O(N²) pairwise combinations
# Helps models learn inter-feature relationships
# Useful for: computer vision, ranking tasks, any data with meaningful ordering

DualScalerTransformer - Boost classification with class-specific scaling

from skboost.preprocessing import DualScalerTransformer

transformer = DualScalerTransformer()
X_scaled = transformer.fit_transform(X)
# Scales features separately for different target classes
# Useful for: imbalanced datasets, multi-class problems with different distributions

Hyperparameter Tuning

zoom_search_cv - Boost efficiency with adaptive hyperparameter search

from skboost.tuning import zoom_search_cv

best_params, best_score = zoom_search_cv(
    estimator, X, y,
    param_grid={'n_estimators': [50, 100, 150], 'max_depth': [3, 6, 9]},
    n_iter=3, cv=5
)
# Starts with 3 values per parameter, iteratively zooms around best region
# Works for both numeric and categorical hyperparameters
# Useful for: faster optimization than exhaustive grid search

Model Evaluation

confidence_report - Boost reliability with confidence-aware evaluation

from skboost.evaluation import confidence_report, plot_confidence_report

reports = confidence_report(y_true, y_proba, thresholds=[0.5, 0.7, 0.9])
plot_confidence_report(reports)
# Shows precision/recall/f1 at different confidence thresholds per class
# Visualize which predictions your model is reliable on
# Useful for: production deployment decisions, finding usable subsets, model monitoring

Additional Tools

GroupDiffTransformer - Sequential feature engineering within groups

from skboost.preprocessing import GroupDiffTransformer

transformer = GroupDiffTransformer(key_col='user_id')
X_transformed = transformer.fit_transform(X)
# Adds: difference from previous row, difference from first row per group

GroupValueCountsTransformer - Value frequency features within groups

from skboost.preprocessing import GroupValueCountsTransformer

transformer = GroupValueCountsTransformer(group_col='session_id', value_col='action')
X_transformed = transformer.fit_transform(X)
# Adds: raw counts and normalized counts per group

Quick Example

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from skboost.preprocessing import RelationalFeaturesTransformer
from skboost.tuning import zoom_search_cv
from skboost.evaluation import confidence_report, plot_confidence_report

# Generate data
X, y = make_classification(n_samples=500, n_classes=3, n_informative=5, random_state=42)

# Add relational features
transformer = RelationalFeaturesTransformer(direction='larger')
X_boosted = transformer.fit_transform(X)

# Adaptive hyperparameter search
param_grid = {'n_estimators': [50, 100, 150], 'max_depth': [3, 6, 9]}
clf = RandomForestClassifier(random_state=42)
best_params, best_score = zoom_search_cv(clf, X_boosted, y, param_grid, n_iter=3)

# Train with best params and evaluate confidence
clf.set_params(**best_params)
clf.fit(X_boosted, y)
y_proba = clf.predict_proba(X_boosted)

# Confidence-stratified evaluation
reports = confidence_report(y, y_proba, thresholds=[0.5, 0.7, 0.9])
plot_confidence_report(reports)

Testing

pytest tests/

See tests/ directory for usage examples in test form.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skboost-0.2.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skboost-0.2.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file skboost-0.2.0.tar.gz.

File metadata

  • Download URL: skboost-0.2.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for skboost-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0289cb142d00a6121ff59ad78da0cacd7c9e1e4e119154c5cf8ef32152cbbbfa
MD5 70d1047a96d1be639451254ec94f361b
BLAKE2b-256 5d7b1d3701cecfefb422babc102fab251b0f714b4d62b2fdc96c4d0b42859be3

See more details on using hashes here.

File details

Details for the file skboost-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: skboost-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for skboost-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2415399ff26ab79f7f18bd05264511089c6764c9815489c4be85a679664c92f
MD5 a9e140f638d851001a4c82305d507aa4
BLAKE2b-256 452821e6c18fc4117e76f6c29d27a1204216206e91c1d211bfc0858f8d4cca96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page