Skip to main content

An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization

Project description

MLArena

Python Version PyPI version License: MIT Code style: black Imports: isort CI/CD

An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization.

Publications

Read about the concepts and methodologies behind MLArena through these articles:

  1. Algorithm-Agnostic Model Building with MLflow - Published in Towards Data Science

    A foundational guide demonstrating how to build algorithm-agnostic ML pipelines using mlflow.pyfunc. The article explores creating generic model wrappers, encapsulating preprocessing logic, and leveraging MLflow's unified model representation for seamless algorithm transitions.

  2. Explainable Generic ML Pipeline with MLflow - Published in Towards Data Science

    An advanced implementation guide that extends the generic ML pipeline with more sophisticated preprocessing and SHAP-based model explanations. The article demonstrates how to build a production-ready pipeline that supports both classification and regression tasks, handles feature preprocessing, and provides interpretable model insights while maintaining algorithm agnosticism.

Installation

The package is undergoing rapid development at the moment (pls see CHANGELOG for details), it is therefore highly recommended to install with specific versions. For example

pip install mlarena==0.1.9

If you are using the package in Databricks ML Cluster with DBR runtime >= 15.2, you can try installing without dependencies (experimental feature):

pip install mlarena --no-deps

Usage Example

Visual Examples:

Model Performance Analysis

Classification Model Performance

Regression Model Performance

Explainable ML

One liner to create global and local explaination based on shap that will work across various classification and regression algorithms.

Global Explanation

Local Explanation

Hyperparameter Optimization

Parallel Coordinate plot for hyperparameter search space diagnostics.
Hyperparameter Search Space

Features

  • Algorithm Agnostic ML Pipeline:

    • End-to-end workflow from preprocessing to deployment
    • Model-agnostic design (works with any scikit-learn compatible model), easily experiment with and swap between algorithms
    • Support for both classification and regression tasks
    • Early stopping and validation set support
    • MLflow integration for experiment tracking and deployment
  • Intelligent Preprocessing:

    • Automated feature type detection and handling
    • Smart encoding recommendations based on feature cardinality and rare category
    • Target encoding with visualization to support smoothing parameter selection
    • Tunable drop options to optimize one-hot encoding based on model (tree vs linear) and feature type (binary vs multi-category)
    • Missing value handling with configurable strategies
    • Feature selection recommendations with mutual information analysis
  • Advanced Model Evaluation:

    • Comprehensive metrics for both classification and regression
    • Diagnostic visualization of model performance
    • Threshold analysis for classification tasks
    • SHAP-based model explanations (global and local)
    • Cross-validation with variance penalty
  • Hyperparameter Optimization:

    • Bayesian optimization with Hyperopt
    • Cross-validation based tuning
    • Parallel coordinates visualization for search space analysis
    • Early stopping to prevent overfitting
    • Variance penalty to ensure stable solutions

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlarena-0.1.9.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlarena-0.1.9-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file mlarena-0.1.9.tar.gz.

File metadata

  • Download URL: mlarena-0.1.9.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.8.0-1021-azure

File hashes

Hashes for mlarena-0.1.9.tar.gz
Algorithm Hash digest
SHA256 3bf0cc22ec9b9488fc9d33ad52fa96bdb7d3cddd336ed1f1ba1d00ebd4acfc05
MD5 f4c9535556d1ec3f8752cfb7c733d8c7
BLAKE2b-256 9f3c80720a4574070305dc77e68ac8c4cd91ef36ca1d9c7c03aca8c427587c77

See more details on using hashes here.

File details

Details for the file mlarena-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: mlarena-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.8.0-1021-azure

File hashes

Hashes for mlarena-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 80e58e994e3b9874f3ec59aa5db91589488f20b784a0b6284b8cffc426795f26
MD5 36b7faf8f86c07735250b52be18301d7
BLAKE2b-256 e590b1472b49ed29b21b7fd886e223a11b868f0b0f3df3a915be147c4f7e6ac2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page