Skip to main content

An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization

Project description

MLArena

Python Version PyPI version License: MIT Code style: black Imports: isort CI/CD

An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization.

Features

  • Comprehensive ML Pipeline:

    • End-to-end workflow from preprocessing to deployment
    • Model-agnostic design (works with any scikit-learn compatible model)
    • Support for both classification and regression tasks
    • Early stopping and validation set support
    • MLflow integration for experiment tracking and deployment
  • Intelligent Preprocessing:

    • Automated feature type detection and handling
    • Smart encoding recommendations based on feature cardinality and rare category
    • Target encoding with visualization to support smoothing parameter selection
    • Missing value handling with configurable strategies
    • Feature selection recommendations with mutual information analysis
  • Advanced Model Evaluation:

    • Comprehensive metrics for both classification and regression
    • Diagnostic visualization of model performance
    • Threshold analysis for classification tasks
    • SHAP-based model explanations (global and local)
    • Cross-validation with variance penalty
  • Hyperparameter Optimization:

    • Bayesian optimization with Hyperopt
    • Cross-validation based tuning
    • Parallel coordinates visualization for search space analysis
    • Early stopping to prevent overfitting
    • Variance penalty to ensure stable solutions

Publications

Learn more about the concepts and methodologies behind MLArena through these articles:

  1. Algorithm-Agnostic Model Building with MLflow - Published in Towards Data Science

    A foundational guide demonstrating how to build algorithm-agnostic ML pipelines using mlflow.pyfunc. The article explores creating generic model wrappers, encapsulating preprocessing logic, and leveraging MLflow's unified model representation for seamless algorithm transitions.

  2. Explainable Generic ML Pipeline with MLflow - Published in Towards Data Science

    An advanced implementation guide that extends the generic ML pipeline with more sophisticated preprocessing and SHAP-based model explanations. The article demonstrates how to build a production-ready pipeline that supports both classification and regression tasks, handles feature preprocessing, and provides interpretable model insights while maintaining algorithm agnosticism.

Installation

pip install mlarena

Usage Example

Visual Examples:

Model Performance Analysis

Classification Model Performance

Regression Model Performance

Explainable ML

One liner to create global and local explaination based on shap that will work across various classification and regression algorithms.

Global Explanation

Local Explanation

Hyperparameter Optimization

Parallel Coordinate plot for hyperparameter search space diagnostics.
Hyperparameter Search Space

Documentation

PreProcessor

The PreProcessor class handles all data preprocessing tasks:

  • Filter Feature Selection
  • Categorical encoding (OneHot, Target)
  • Recommendation of encoding strategy
  • Plot to compare target encoding smoothing parameters
  • Numeric scaling
  • Missing value imputation

ML_PIPELINE

The ML_PIPELINE class provides a complete machine learning workflow:

  • Algorithm agnostic model wrapper
  • Support both classification (binary) and regression algorithms
  • Model training and scoring
  • Model global and local explanation
  • Model evaluation with comprehensive reporting and plots
  • Iterative hyperparameter tuning with diagnostic plot
  • Threshold analysis and optimization for classification models

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page