No project description provided

These details have not been verified by PyPI

Project description

Scikit-learn Smithy

A CLI to forge scikit-learn compatible estimator templates with ease.

Why

Writing a scikit-learn compatible estimators might be harder than expected.

While everyone knows about the fit and predict, there are other behaviours, methods and attributes that scikit-learn might be expecting from your estimator. These depend on:

The type of estimator you're writing.
The signature of the estimator.
The signature of the .fit(...) method.

This tool aims to help you with that by asking you a few questions about your estimator, and then generating the boilerplate code for you, so that you can focus on the core implementation of the estimator, and not on the nitty-gritty details of the scikit-learn API.

Once the core logic is implemented, the estimator should be ready to test against the somewhat official parametrize_with_checks pytest compatible decorator:

from sklearn.utils.estimator_checks import parametrize_with_checks

@parametrize_with_checks([YourAwesomeRegressor, MoreAwesomeClassifier, EvenMoreAwesomeTransformer])
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

Web UI

The tool made it into a web ui powered by streamlit, so that there is no need to install anything locally to try it out.

Installation

Suggested to install it directly from pypi:

python -m pip install sklearn-smithy

This will make the smith command available in your terminal.

Commands

The smith entrypoint offers two commands:

smith --help

Usage: smith [OPTIONS] COMMAND [ARGS]...                                                                                                                          
                
Awesome CLI to generate scikit-learn estimator boilerplate code
...
╭─ Commands ──────────────────────────────────────────────────────────────────────────────╮
│ forge     Asks a list of questions to generate a shiny new estimator ✨                │
│ version   Display library version.                                                      │
╰─────────────────────────────────────────────────────────────────────────────────────────╯

and as you can already guess, the forge command is the one that will generate the boilerplate code for you.

smith forge --help

Asks a list of questions to generate a shiny new estimator ✨

Depending on the **estimator type** the additional information could be required:

* if the estimator is linear (classifier or regression)
* if the estimator has a `predict_proba` method (classifier or outlier detector)
* is the estimator has a `decision_function` method (classifier only)

Finally, the following two questions will be prompt:

* if the estimator should have tags (To know more about tags, check the dedicated
    [scikit-learn documentation](https://scikit-learn.org/dev/developers/develop.html#estimator-tags))
* in which file the class should be saved (default is `f'{name.lower()}.py'`)
                                                  
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --name                                                   TEXT                                        name the estimator. [default: None] [required]                                                │
│ *  --estimator-type                                         [classifier|outlier|regressor|transformer]  Estimator type. [default: None] [required]                                                    │
│    --required-params                                        TEXT                                        List of required parameters (comma-separated).                                                │
│    --other-params                                           TEXT                                        List of optional parameters (comma-separated).                                                │
│    --support-sample-weight    --no-support-sample-weight                                                Whether or not `.fit()` does support `sample_weight`. [default: no-support-sample-weight]     │
│    --help                                                                                               Show this message and exit.                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Origin story

The idea for this tool originated from scikit-lego #660:

So the story goes as the following:

The CI/CD fails for scikit-learn==1.5rc1 because of a change in the check_estimator internals

In the scikit-learn issue I got a better picture of how to run test for compatible components

In particular, in rolling your own estimator suggests to use parametrize_with_checks, and of course I thought "that is a great idea to avoid dealing manually with each test"

Say no more, I enter a rabbit hole to refactor all our tests - which would be fine

Except that these tests failures helped me figure out a few missing parts in the codebase

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Jun 15, 2024

0.1.0

Jun 10, 2024

0.0.10

Jun 8, 2024

0.0.9

Jun 7, 2024

0.0.8

Jun 7, 2024

0.0.7

Jun 3, 2024

0.0.6

May 31, 2024

0.0.5

May 30, 2024

0.0.4

May 28, 2024

This version

0.0.3

May 22, 2024

0.0.2

May 21, 2024

0.0.1

May 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_smithy-0.0.3.tar.gz (11.2 kB view hashes)

Uploaded May 22, 2024 Source

Built Distribution

sklearn_smithy-0.0.3-py3-none-any.whl (14.2 kB view hashes)

Uploaded May 22, 2024 Python 3

Hashes for sklearn_smithy-0.0.3.tar.gz

Hashes for sklearn_smithy-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`ed971898ddaeef5b93fd81d6417268862667f6f2d197b309f29fe18c12f4e240`
MD5	`5d2266d3bfa59e204f7d2980243fe2d8`
BLAKE2b-256	`c33996d9ae933acea573d4aba521b98cbc4741f1cfd8d41c8e68d361a52fd8ba`

Hashes for sklearn_smithy-0.0.3-py3-none-any.whl

Hashes for sklearn_smithy-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf3489e8366c06e82ceb2d804f424bc68a073970794b46cf70ef0431eaa57c4b`
MD5	`ca33c3e7803e13587ee781f430268c09`
BLAKE2b-256	`32fc91a6197d2aaceed7beeef29159e99258b63ed5df32dd2cbd990768068cc7`