Skip to main content

Toolkit to forge scikit-learn compatible estimators.

Project description

Scikit-learn Smithy

Scikit-learn smithy is a tool that helps you to forge scikit-learn compatible estimator with ease.


WebUI | Documentation | Repository | Issue Tracker


How can you use it?

✅ Directly from the browser via a Web UI.
✅ As a CLI (command line interface) in the terminal.
  • Available via the smith forge command.
  • It requires installation: python -m pip install sklearn-smithy
  • Powered by typer.
✅ As a TUI (terminal user interface) in the terminal.
  • Available via the smith forge-tui command.
  • It requires installing extra dependencies: python -m pip install "sklearn-smithy[textual]"
  • Powered by textual.

All these tools will prompt a series of questions regarding the estimator you want to create, and then it will generate the boilerplate code for you.

Why ❓

Writing scikit-learn compatible estimators might be harder than expected.

While everyone knows about the fit and predict, there are other behaviours, methods and attributes that scikit-learn might be expecting from your estimator depending on:

  • The type of estimator you're writing.
  • The signature of the estimator.
  • The signature of the .fit(...) method.

Scikit-learn Smithy to the rescue: this tool aims to help you crafting your own estimator by asking a few questions about it, and then generating the boilerplate code.

In this way you will be able to fully focus on the core implementation logic, and not on nitty-gritty details of the scikit-learn API.

Sanity check

Once the core logic is implemented, the estimator should be ready to test against the somewhat official parametrize_with_checks pytest compatible decorator:

from sklearn.utils.estimator_checks import parametrize_with_checks

@parametrize_with_checks([
    YourAwesomeRegressor,
    MoreAwesomeClassifier,
    EvenMoreAwesomeTransformer,
])
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

and it should be compatible with scikit-learn Pipeline, GridSearchCV, etc.

Official guide

Scikit-learn documentation on how to develop estimators.

Supported estimators

The following types of scikit-learn estimator are supported:

  • ✅ Classifier
  • ✅ Regressor
  • ✅ Outlier Detector
  • ✅ Clusterer
  • ✅ Transformer
    • ✅ Feature Selector
  • 🚧 Meta Estimator

Installation

sklearn-smithy is available on pypi, so you can install it directly from there:

python -m pip install sklearn-smithy

Remark: The minimum Python version required is 3.10.

This will make the smith command available in your terminal, and you should be able to run the following:

smith version

sklearn-smithy=...

Extra dependencies

To run the TUI, you need to install the textual dependency as well:

python -m pip install "sklearn-smithy[textual]"

User guide 📚

Please refer to the dedicated user guide documentation section.

Origin story

The idea for this tool originated from scikit-lego #660, which I cannot better explain than quoting the PR description itself:

So the story goes as the following:

  • The CI/CD fails for scikit-learn==1.5rc1 because of a change in the check_estimator internals
  • In the scikit-learn issue I got a better picture of how to run test for compatible components
  • In particular, rolling your own estimator suggests to use parametrize_with_checks, and of course I thought "that is a great idea to avoid dealing manually with each test"
  • Say no more, I enter a rabbit hole to refactor all our tests - which would be fine
  • Except that these tests failures helped me figure out a few missing parts in the codebase

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_smithy-0.2.0.tar.gz (18.9 kB view hashes)

Uploaded Source

Built Distribution

sklearn_smithy-0.2.0-py3-none-any.whl (24.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page