No project description provided
Project description
Scikit-learn Smithy
Scikit-learn smithy is a tool that helps you to forge scikit-learn compatible estimator templates with ease.
How can you use it?
- โ From a web UI powered by streamlit.
- โ
As a CLI (command line interface):
smith forge
command (see installation and commands). - ๐ง As a TUI (terminal user interface): We are not there yet!
Why โ
Writing a scikit-learn compatible estimators might be harder than expected.
While everyone knows about the fit
and predict
, there are other behaviours, methods and attributes that
scikit-learn might be expecting from your estimator depending on:
- The type of estimator you're writing.
- The signature of the estimator.
- The signature of the
.fit(...)
method.
Scikit-learn Smithy to the rescue: this tool aims to help you crafting your own estimator by asking a few questions about it, and then generating the boilerplate code.
In this way you will be able to fully focus on the core implementation logic, and not on nitty-gritty details of the scikit-learn API.
Once the core logic is implemented, the estimator should be ready to test against the somewhat official parametrize_with_checks
pytest compatible decorator:
from sklearn.utils.estimator_checks import parametrize_with_checks
@parametrize_with_checks([YourAwesomeRegressor, MoreAwesomeClassifier, EvenMoreAwesomeTransformer])
def test_sklearn_compatible_estimator(estimator, check):
check(estimator)
Installation
To use the tool from the terminal, we suggest to install it directly from pypi:
python -m pip install sklearn-smithy
This will make the smith
command available in your terminal.
Available CLI commands
The smith
entrypoint offers two commands:
smith --help
Usage: smith [OPTIONS] COMMAND [ARGS]...
Awesome CLI to generate scikit-learn estimator boilerplate code
...
โญโ Commands โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ forge Generate a new shiny scikit-learn compatible estimator โจ โ
โ version Display library version. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
and as you can already guess, the forge
command is the one that will generate the boilerplate code for you.
smith forge --help
Generate a new shiny scikit-learn compatible estimator โจ
Depending on the estimator type the following additional information could be required:
* if the estimator is linear (classifier or regression)
* if the estimator has a `predict_proba` method (classifier or outlier detector)
* is the estimator has a `decision_function` method (classifier only)
Finally, the following two questions will be prompt:
* if the estimator should have tags (To know more about tags, check the dedicated scikit-learn documentation
at https://scikit-learn.org/dev/developers/develop.html#estimator-tags
* in which file the class should be saved (default is `f'{name.lower()}.py'`)
โญโ Options โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ * --name TEXT Name of the estimator [default: None] [required] โ
โ * --estimator-type [classifier|outlier|regressor|transformer|cluster] Estimator type [default: None] [required] โ
โ --required-params TEXT List of (comma-separated) required parameters โ
โ --optional-params TEXT List of (comma-separated) optional parameters โ
โ --sample-weight --no-sample-weight Whether or not `.fit()` supports `sample_weight` [default: no-sample-weight] โ
โ --linear --no-linear Whether or not the estimator is linear [default: no-linear] โ
โ --predict-proba --no-predict-proba Whether or not the estimator implements `predict_proba` method [default: no-predict-proba] โ
โ --decision-function --no-decision-function Whether or not the estimator implements `decision_function` method โ
โ [default: no-decision-function] โ
โ --tags TEXT List of optional extra scikit-learn tags โ
โ --output-file TEXT Destination file where to save the boilerplate code โ
โ --help Show this message and exit. โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Origin story
The idea for this tool originated from scikit-lego #660, which I cannot better explain than quoting the PR description:
So the story goes as the following:
- The CI/CD fails for scikit-learn==1.5rc1 because of a change in the
check_estimator
internals- In the scikit-learn issue I got a better picture of how to run test for compatible components
- In particular, in rolling your own estimator suggests to use
parametrize_with_checks
, and of course I thought "that is a great idea to avoid dealing manually with each test"- Say no more, I enter a rabbit hole to refactor all our tests - which would be fine
- Except that these tests failures helped me figure out a few missing parts in the codebase
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sklearn_smithy-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 186fd60f7e7e6b9342c617e4668a8db04b028d8f5681e2f1c36d600cb2a9d525 |
|
MD5 | 7fcf257489e49f44ca0875f960a35427 |
|
BLAKE2b-256 | 366e3dcb5f4a6234c961a01c4452e89d9398db62a912c38161dc36a6dba0bbb1 |