Skip to main content

An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization

Project description

MLArena

Python Version PyPI License: MIT Code style: black Imports: isort CI/CD

An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization.

Publications

Read about the concepts and methodologies behind MLArena through these articles:

  1. Algorithm-Agnostic Model Building with MLflow - Published in Towards Data Science

    A foundational guide demonstrating how to build algorithm-agnostic ML pipelines using mlflow.pyfunc. The article explores creating generic model wrappers, encapsulating preprocessing logic, and leveraging MLflow's unified model representation for seamless algorithm transitions.

  2. Explainable Generic ML Pipeline with MLflow - Published in Towards Data Science

    An advanced implementation guide that extends the generic ML pipeline with more sophisticated preprocessing and SHAP-based model explanations. The article demonstrates how to build a production-ready pipeline that supports both classification and regression tasks, handles feature preprocessing, and provides interpretable model insights while maintaining algorithm agnosticism.

Installation

The package is undergoing rapid development at the moment (pls see CHANGELOG for details), it is therefore highly recommended to install with specific versions. For example

%pip install mlarena==0.2.0

If you are using the package in Databricks ML Cluster with DBR runtime >= 16.0, you can install without dependencies like below:

%pip install mlarena==0.2.0 --no-deps

If you are using earlier DBR runtimes, simply install optuna in addition like below. Note: As of 2025-04-26, optuna is recommended by Databricks, while hyperopt will be removed from Databricks ML Runtime.

%pip install mlarena==0.2.0 --no-deps
%pip install optuna==3.6.1

Usage Example

Visual Examples:

Model Performance Analysis

Classification Model Performance

Regression Model Performance

Explainable ML

One liner to create global and local explanation based on SHAP that will work across various classification and regression algorithms.

Global Explanation

Local Explanation

Hyperparameter Optimization

Parallel coordinates plot for hyperparameter search space diagnostics.
Hyperparameter Search Space

Features

Algorithm Agnostic ML Pipeline

  • Unified interface for any scikit-learn compatible model
  • Consistent workflow across classification and regression tasks
  • Automated report generation with comprehensive metrics and visuals
  • Production-ready with MLflow integration for deployment
  • Simplified handoff between experimentation and production

Intelligent Preprocessing

  • Streamlined feature preprocessing with smart defaults and minimal code
  • Automatic feature analysis with data-driven encoding recommendations
  • Integrated target encoding with visualization for optimal smoothing selection
  • Feature filtering based on information theory metrics (mutual information)
  • Handles the full preprocessing pipeline from missing values to feature encoding
  • Seamless integration with scikit-learn and MLflow for production deployment

Model Optimization

  • Efficient hyperparameter tuning with Optuna's TPE sampler
  • Smart early stopping with patient pruning to save computation resources
  • Cross-validation with variance penalty to prevent overfitting
  • Parallel coordinates visualization for search history tracking and parameter space diagnostics
  • Automated threshold optimization with business-focused F-beta scoring
  • Flexible metric selection for optimization
    • Classification: AUC (default), F1, accuracy
    • Regression: RMSE (default), NRMSE, MAPE

Performance Analysis

  • Comprehensive metric tracking
    • Classification: AUC, F1, Fbeta, precision, recall
    • Regression: RMSE, MAE, R2, adjusted R2, MAPE
  • Performance visualization
    • Classification: ROC_AUC curve, Precision-recall curve
    • Regression: Residual analysis, Prediction error plot
  • Model interpretability
    • Global feature importance
    • Local prediction explanations

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlarena-0.2.1.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlarena-0.2.1-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file mlarena-0.2.1.tar.gz.

File metadata

  • Download URL: mlarena-0.2.1.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.11.0-1012-azure

File hashes

Hashes for mlarena-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d5502cd5d5dfa78de5a17183f64bd7e7d2b9ee5db40aabc7ff25a98e03ef42d6
MD5 5986846b2dca056c8ac6a4645de9754a
BLAKE2b-256 cae1842f2fff786a4b421a1eb9800bc2575bb1007d4fbe0d58637ffeb5bdabdf

See more details on using hashes here.

File details

Details for the file mlarena-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: mlarena-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.11.0-1012-azure

File hashes

Hashes for mlarena-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 32232cb69586f949b95bc0d6ecd9b19c256e959b107c05c07976d96080e66842
MD5 f4b6c141636eb565e7dec61f2109b0b6
BLAKE2b-256 a3ad20c37d1b08dfed73bebdcf39662edd71790d0a54266393701a0b4b62219a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page