Skip to main content

Lightweight framework for structured and repeatable model validation

Project description

kotsu: lightweight framework for structuring model validation

PyPI version lint-test status codecov

What is it?

kotsu is Python package that provides a lightweight and flexible framework to structure validating and comparing machine learning models. It aims to provide the skeleton on which to develop models and to validate them in a robust and repeatable way, minimizing bloat or overhead. Its flexibility allows usage with any model interface and any validation technique, no matter how complex. The structure it provides avoids common pitfalls that occur when attempting to make fair comparisons between models.

Main Features

  • Register a model with hyperparameters to a unique ID
  • Register validations to a unique ID
  • Run all registered models through all registered validations, and have the results compiled and stored as a CSV
  • Optionally passes an artefacts_store_dir to your validations, for storing of outputs for further analysis, e.g. trained models or model predictions on test data sets
  • Doesn't enforce any constraints or requirements on your models' interfaces
  • Pure Python package, with no other setup or configuration of other systems required

Where to get it

The source code is currently hosted on GitHub at: https://github.com/datavaluepeople/kotsu

The latest released version of the package can be installed from PyPI with:

pip install kotsu

Usage

The following demonstrates a simple usage of kotsu to register and validate multiple models over multiple validations.

Import kotsu and your packages for modelling:

import kotsu
from sklearn import datasets, svm
from sklearn.model_selection import cross_val_score

Register your competing models:

Here we register two Support Vector Classifiers with different hyper-parameters.

model_registry = kotsu.registration.ModelRegistry()

model_registry.register(
    id="SVC-v1",
    entry_point=svm.SVC,
    kwargs={"kernel": "linear", "C": 1, "random_state": 1},
)

model_registry.register(
    id="SVC-v2",
    entry_point=svm.SVC,
    kwargs={"kernel": "linear", "C": 0.5, "random_state": 1},
)

Register your validations:

You can register multiple validations if you want to compare models in different scenarios, e.g. on different datasets. Your validations should take an instance of your models as an argument, then return a dictionary containing the results from validation of that model. Here we register two Cross-Validation validations with different numbers of folds.

validation_registry = kotsu.registration.ValidationRegistry()


def factory_iris_cross_validation(folds: int):
    """Factory for iris cross validation."""

    def iris_cross_validation(model) -> dict:
        """Iris classification cross validation."""
        X, y = datasets.load_iris(return_X_y=True)
        scores = cross_val_score(model, X, y, cv=folds)
        results = {f"fold_{i}_score": score for i, score in enumerate(scores)}
        results["mean_score"] = scores.mean()
        results["std_score"] = scores.std()
        return results

    return iris_cross_validation


validation_registry.register(
    id="iris_cross_validation-v1",
    entry_point=factory_iris_cross_validation,
    kwargs={"folds": 5},
)

validation_registry.register(
    id="iris_cross_validation-v2",
    entry_point=factory_iris_cross_validation,
    kwargs={"folds": 10},
)

Run the models through the validations:

We choose the current directory as the location for writing the results.

kotsu.run(model_registry, validation_registry)

Then find the results from each model-validation combination in a CSV written to the current directory.

Documentation on interfaces

See kotsu.typing for documentation on the main entities; Models, Validations, and Results, and their interfaces.

Comprehensive example

See the end to end test for a more comprehensive example usage of kotsu, which includes storing the trained models from each model-validation run.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kotsu-0.3.3.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

kotsu-0.3.3-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file kotsu-0.3.3.tar.gz.

File metadata

  • Download URL: kotsu-0.3.3.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for kotsu-0.3.3.tar.gz
Algorithm Hash digest
SHA256 b344cb3ad0a6ba1f1ffea90ba2d53749273a9851b1651f3dfbb1b25b33f464ac
MD5 c206accf326707347bb4bf0a37ee1e86
BLAKE2b-256 04f23f306aba18a6537e90ab8972a5db9febe4a8ce29786987b7ec4017096704

See more details on using hashes here.

File details

Details for the file kotsu-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: kotsu-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for kotsu-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c39fbf207dac10867884a27a83fd7dcd7282604a33d8f6a349fa0e163d016067
MD5 3c5a09eaefba3959751362f4080d9cb1
BLAKE2b-256 7eeea6d750d03fde4b36a6a1fb6907cef624dfdec0859984dd7dd5dbb4f3a355

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page