Skip to main content

No project description provided

Project description

Scikit-learn Smithy

Scikit-learn smithy is a tool that helps you to forge scikit-learn compatible estimator templates with ease.

How can you use it?

  • โœ… From a web UI powered by streamlit.
  • โœ… As a CLI (command line interface): smith forge command (see installation and commands).
  • ๐Ÿšง As a TUI (terminal user interface): We are not there yet!

Why โ“

Writing a scikit-learn compatible estimators might be harder than expected.

While everyone knows about the fit and predict, there are other behaviours, methods and attributes that scikit-learn might be expecting from your estimator depending on:

  • The type of estimator you're writing.
  • The signature of the estimator.
  • The signature of the .fit(...) method.

Scikit-learn Smithy to the rescue: this tool aims to help you crafting your own estimator by asking a few questions about it, and then generating the boilerplate code.

In this way you will be able to fully focus on the core implementation logic, and not on nitty-gritty details of the scikit-learn API.

Once the core logic is implemented, the estimator should be ready to test against the somewhat official parametrize_with_checks pytest compatible decorator:

from sklearn.utils.estimator_checks import parametrize_with_checks

@parametrize_with_checks([YourAwesomeRegressor, MoreAwesomeClassifier, EvenMoreAwesomeTransformer])
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

Installation

To use the tool from the terminal, we suggest to install it directly from pypi:

python -m pip install sklearn-smithy

This will make the smith command available in your terminal.

Available CLI commands

The smith entrypoint offers two commands:

smith --help
Usage: smith [OPTIONS] COMMAND [ARGS]...

CLI to generate scikit-learn estimator boilerplate code

...

โ•ญโ”€ Commands โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ forge     Generate a new shiny scikit-learn compatible estimator โœจ                    โ”‚
โ”‚ version   Display library version.                                                     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

and as you can already guess, the forge command is the one that will generate the boilerplate code for you.

smith forge --help
Generate a new shiny scikit-learn compatible estimator โœจ

Depending on the estimator type the following additional information could be required:

* if the estimator is linear (classifier or regression)
* if the estimator implements `.predict_proba()` method (classifier or outlier detector)
* if the estimator implements `.decision_function()` method (classifier only)

Finally, the following two questions will be prompt:

* if the estimator should have tags (To know more about tags, check the dedicated scikit-learn documentation
    at https://scikit-learn.org/dev/developers/develop.html#estimator-tags)
* in which file the class should be saved (default is `f'{name.lower()}.py'`)


โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ *  --name                                           TEXT                                                Name of the estimator [default: None] [required]                                              โ”‚
โ”‚ *  --estimator-type                                 [classifier|outlier|regressor|transformer|cluster]  Estimator type [default: None] [required]                                                     โ”‚
โ”‚    --required-params                                TEXT                                                List of (comma-separated) required parameters                                                 โ”‚
โ”‚    --optional-params                                TEXT                                                List of  (comma-separated) optional parameters                                                โ”‚
โ”‚    --sample-weight        --no-sample-weight                                                            Whether or not `.fit()` supports `sample_weight` [default: no-sample-weight]                  โ”‚
โ”‚    --linear               --no-linear                                                                   Whether or not the estimator is linear [default: no-linear]                                   โ”‚
โ”‚    --predict-proba        --no-predict-proba                                                            Whether or not the estimator implements `predict_proba` method [default: no-predict-proba]    โ”‚
โ”‚    --decision-function    --no-decision-function                                                        Whether or not the estimator implements `decision_function` method                            โ”‚
โ”‚                                                                                                         [default: no-decision-function]                                                               โ”‚
โ”‚    --tags                                           TEXT                                                List of optional extra scikit-learn tags                                                      โ”‚
โ”‚    --output-file                                    TEXT                                                Destination file where to save the boilerplate code                                           โ”‚
โ”‚    --help                                                                                               Show this message and exit.                                                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Origin story

The idea for this tool originated from scikit-lego #660, which I cannot better explain than quoting the PR description:

So the story goes as the following:

  • The CI/CD fails for scikit-learn==1.5rc1 because of a change in the check_estimator internals
  • In the scikit-learn issue I got a better picture of how to run test for compatible components
  • In particular, in rolling your own estimator suggests to use parametrize_with_checks, and of course I thought "that is a great idea to avoid dealing manually with each test"
  • Say no more, I enter a rabbit hole to refactor all our tests - which would be fine
  • Except that these tests failures helped me figure out a few missing parts in the codebase

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_smithy-0.0.8.tar.gz (14.1 kB view hashes)

Uploaded Source

Built Distribution

sklearn_smithy-0.0.8-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page