Toolkit to forge scikit-learn compatible estimators.
Project description
Scikit-learn Smithy
Scikit-learn smithy is a tool that helps you to forge scikit-learn compatible estimator with ease.
WebUI | Documentation | Repository | Issue Tracker
How can you use it?
✅ Directly from the browser via a Web UI.
- Available at sklearn-smithy.streamlit.app
- It requires no installation.
- Powered by streamlit
✅ As a CLI (command line interface) in the terminal.
- Available via the
smith forge
command. - It requires installation:
python -m pip install sklearn-smithy
- Powered by typer.
✅ As a TUI (terminal user interface) in the terminal.
- Available via the
smith-tui
command. - It requires installing extra dependencies:
python -m pip install "sklearn-smithy[textual]"
- Powered by textual.
All these tools will prompt a series of questions regarding the estimator you want to create, and then it will generate the boilerplate code for you.
Why ❓
Writing scikit-learn compatible estimators might be harder than expected.
While everyone knows about the fit
and predict
, there are other behaviours, methods and attributes that
scikit-learn might be expecting from your estimator depending on:
- The type of estimator you're writing.
- The signature of the estimator.
- The signature of the
.fit(...)
method.
Scikit-learn Smithy to the rescue: this tool aims to help you crafting your own estimator by asking a few questions about it, and then generating the boilerplate code.
In this way you will be able to fully focus on the core implementation logic, and not on nitty-gritty details of the scikit-learn API.
Sanity check
Once the core logic is implemented, the estimator should be ready to test against the somewhat official
parametrize_with_checks
pytest compatible decorator:
from sklearn.utils.estimator_checks import parametrize_with_checks
@parametrize_with_checks([
YourAwesomeRegressor,
MoreAwesomeClassifier,
EvenMoreAwesomeTransformer,
])
def test_sklearn_compatible_estimator(estimator, check):
check(estimator)
and it should be compatible with scikit-learn Pipeline, GridSearchCV, etc.
Official guide
Scikit-learn documentation on how to develop estimators.
Supported estimators
The following types of scikit-learn estimator are supported:
- ✅ Classifier
- ✅ Regressor
- ✅ Outlier Detector
- ✅ Clusterer
- ✅ Transformer
- ✅ Feature Selector
- 🚧 Meta Estimator
Installation
sklearn-smithy is available on pypi, so you can install it directly from there:
python -m pip install sklearn-smithy
Remark: The minimum Python version required is 3.10.
This will make the smith
command available in your terminal, and you should be able to run the following:
smith version
sklearn-smithy=...
Extra dependencies
To run the TUI, you need to install the textual
dependency as well:
python -m pip install "sklearn-smithy[textual]"
User guide 📚
Please refer to the dedicated user guide documentation section.
Origin story
The idea for this tool originated from scikit-lego #660, which I cannot better explain than quoting the PR description itself:
So the story goes as the following:
- The CI/CD fails for scikit-learn==1.5rc1 because of a change in the
check_estimator
internals- In the scikit-learn issue I got a better picture of how to run test for compatible components
- In particular, rolling your own estimator suggests to use
parametrize_with_checks
, and of course I thought "that is a great idea to avoid dealing manually with each test"- Say no more, I enter a rabbit hole to refactor all our tests - which would be fine
- Except that these tests failures helped me figure out a few missing parts in the codebase
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sklearn_smithy-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e2510260cdc8ff989e5f33c17458d2c3c1ed73aa39d5ad95469f9c56688bc3b |
|
MD5 | 3e9d9b732b405c3139d833422520f2d0 |
|
BLAKE2b-256 | cb05ede78c99d7c958e958f95eb19e7cebb466601697d264731684af55e7f6b2 |