An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization
Project description
MLArena
An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization.
Features
-
Comprehensive ML Pipeline:
- End-to-end workflow from preprocessing to deployment
- Model-agnostic design (works with any scikit-learn compatible model)
- Support for both classification and regression tasks
- Early stopping and validation set support
- MLflow integration for experiment tracking and deployment
-
Intelligent Preprocessing:
- Automated feature type detection and handling
- Smart encoding recommendations based on feature cardinality and rare category
- Target encoding with visualization to support smoothing parameter selection
- Missing value handling with configurable strategies
- Feature selection recommendations with mutual information analysis
-
Advanced Model Evaluation:
- Comprehensive metrics for both classification and regression
- Diagnostic visualization of model performance
- Threshold analysis for classification tasks
- SHAP-based model explanations (global and local)
- Cross-validation with variance penalty
-
Hyperparameter Optimization:
- Bayesian optimization with Hyperopt
- Cross-validation based tuning
- Parallel coordinates visualization for search space analysis
- Early stopping to prevent overfitting
- Variance penalty to ensure stable solutions
Publications
Learn more about the concepts and methodologies behind MLArena through these articles:
-
Algorithm-Agnostic Model Building with MLflow - Published in Towards Data Science
A foundational guide demonstrating how to build algorithm-agnostic ML pipelines using mlflow.pyfunc. The article explores creating generic model wrappers, encapsulating preprocessing logic, and leveraging MLflow's unified model representation for seamless algorithm transitions.
-
Explainable Generic ML Pipeline with MLflow - Published in Towards Data Science
An advanced implementation guide that extends the generic ML pipeline with more sophisticated preprocessing and SHAP-based model explanations. The article demonstrates how to build a production-ready pipeline that supports both classification and regression tasks, handles feature preprocessing, and provides interpretable model insights while maintaining algorithm agnosticism.
Installation
pip install mlarena
Usage Example
- For quick start with a basic example, see examples/basic_usage.ipynb.
- For more advanced examples, see examples/advanced_usage.ipynb.
Visual Examples:
Model Performance Analysis
Explainable ML
One liner to create global and local explaination based on shap that will work across various classification and regression algorithms.
Hyperparameter Optimization
Parallel Coordinate plot for hyperparameter search space diagnostics.
Documentation
PreProcessor
The PreProcessor
class handles all data preprocessing tasks:
- Filter Feature Selection
- Categorical encoding (OneHot, Target)
- Recommendation of encoding strategy
- Plot to compare target encoding smoothing parameters
- Numeric scaling
- Missing value imputation
ML_PIPELINE
The ML_PIPELINE
class provides a complete machine learning workflow:
- Algorithm agnostic model wrapper
- Support both classification (binary) and regression algorithms
- Model training and scoring
- Model global and local explanation
- Model evaluation with comprehensive reporting and plots
- Iterative hyperparameter tuning with diagnostic plot
- Threshold analysis and optimization for classification models
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.