Skip to main content

Toolkit to forge scikit-learn compatible estimators.

Project description

Scikit-learn Smithy

Scikit-learn smithy is a tool that helps you to forge scikit-learn compatible estimator with ease.


WebUI | Documentation | Repository | Issue Tracker


How can you use it?

✅ Directly from the browser via a Web UI.
✅ As a CLI (command line interface) in the terminal.
  • Available via the smith forge command.
  • It requires installation: python -m pip install sklearn-smithy
  • Powered by typer.
✅ As a TUI (terminal user interface) in the terminal.
  • Available via the smith forge-tui command.
  • It requires installing extra dependencies: python -m pip install "sklearn-smithy[textual]"
  • Powered by textual.

All these tools will prompt a series of questions regarding the estimator you want to create, and then it will generate the boilerplate code for you.

Why ❓

Writing scikit-learn compatible estimators might be harder than expected.

While everyone knows about the fit and predict, there are other behaviours, methods and attributes that scikit-learn might be expecting from your estimator depending on:

  • The type of estimator you're writing.
  • The signature of the estimator.
  • The signature of the .fit(...) method.

Scikit-learn Smithy to the rescue: this tool aims to help you crafting your own estimator by asking a few questions about it, and then generating the boilerplate code.

In this way you will be able to fully focus on the core implementation logic, and not on nitty-gritty details of the scikit-learn API.

Sanity check

Once the core logic is implemented, the estimator should be ready to test against the somewhat official parametrize_with_checks pytest compatible decorator:

from sklearn.utils.estimator_checks import parametrize_with_checks

@parametrize_with_checks([
    YourAwesomeRegressor,
    MoreAwesomeClassifier,
    EvenMoreAwesomeTransformer,
])
def test_sklearn_compatible_estimator(estimator, check):
    check(estimator)

and it should be compatible with scikit-learn Pipeline, GridSearchCV, etc.

Official guide

Scikit-learn documentation on how to develop estimators.

Supported estimators

The following types of scikit-learn estimator are supported:

  • ✅ Classifier
  • ✅ Regressor
  • ✅ Outlier Detector
  • ✅ Clusterer
  • ✅ Transformer
    • ✅ Feature Selector
  • 🚧 Meta Estimator

Installation

sklearn-smithy is available on pypi, so you can install it directly from there:

python -m pip install sklearn-smithy

Remark: The minimum Python version required is 3.10.

This will make the smith command available in your terminal, and you should be able to run the following:

smith version

sklearn-smithy=...

Extra dependencies

To run the TUI, you need to install the textual dependency as well:

python -m pip install "sklearn-smithy[textual]"

User guide 📚

Please refer to the dedicated user guide documentation section.

Origin story

The idea for this tool originated from scikit-lego #660, which I cannot better explain than quoting the PR description itself:

So the story goes as the following:

  • The CI/CD fails for scikit-learn==1.5rc1 because of a change in the check_estimator internals
  • In the scikit-learn issue I got a better picture of how to run test for compatible components
  • In particular, rolling your own estimator suggests to use parametrize_with_checks, and of course I thought "that is a great idea to avoid dealing manually with each test"
  • Say no more, I enter a rabbit hole to refactor all our tests - which would be fine
  • Except that these tests failures helped me figure out a few missing parts in the codebase

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_smithy-0.2.0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

sklearn_smithy-0.2.0-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file sklearn_smithy-0.2.0.tar.gz.

File metadata

  • Download URL: sklearn_smithy-0.2.0.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for sklearn_smithy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 aa61505872cfd40ffe8695d711688a2b9d8c5cc7762241f92567f0e187575d07
MD5 80cae5969a57d614f812c6450b0e102c
BLAKE2b-256 4ff600e9fca8e50fe7b5c279790d2bd020c2f172b41b7f504ec70efc9eda18ce

See more details on using hashes here.

File details

Details for the file sklearn_smithy-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sklearn_smithy-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1769f7c128d0c43f6143a5ced01cc7298ba943d3fcb13953c14f94e91910ca98
MD5 aa549711383a44811ec2b2d8041bd050
BLAKE2b-256 f4920b8d6b01fbb639b16560c18506cd1b66c6f168972d34bc614ca7b9cfbd16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page