An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization
Project description
MLArena
An algorithm-agnostic machine learning toolkit for model training, diagnostics and optimization.
Publications
Read about the concepts and methodologies behind MLArena through these articles:
-
Algorithm-Agnostic Model Building with MLflow - Published in Towards Data Science
A foundational guide demonstrating how to build algorithm-agnostic ML pipelines using mlflow.pyfunc. The article explores creating generic model wrappers, encapsulating preprocessing logic, and leveraging MLflow's unified model representation for seamless algorithm transitions.
-
Explainable Generic ML Pipeline with MLflow - Published in Towards Data Science
An advanced implementation guide that extends the generic ML pipeline with more sophisticated preprocessing and SHAP-based model explanations. The article demonstrates how to build a production-ready pipeline that supports both classification and regression tasks, handles feature preprocessing, and provides interpretable model insights while maintaining algorithm agnosticism.
Installation
The package is undergoing rapid development at the moment (pls see CHANGELOG for details), it is therefore highly recommended to install with specific versions. For example
%pip install mlarena==0.2.0
If you are using the package in Databricks ML Cluster with DBR runtime >= 16.0, you can install without dependencies like below:
%pip install mlarena==0.2.0 --no-deps
If you are using earlier DBR runtimes, simply install optuna in addition like below. Note: As of 2025-04-26, optuna is recommended by Databricks, while hyperopt will be removed from Databricks ML Runtime.
%pip install mlarena==0.2.0 --no-deps
%pip install optuna==3.6.1
Usage Example
- For quick start with a basic example, see 1.basic_usage.ipynb.
- For more advanced examples on model optimization, see 2.advanced_usage.ipynb.
- For visualization utilities, see 3.utils_plot.ipynb.
- For handling common challenges in machine learning, see 4.ml_discussions.ipynb.
Visual Examples:
Model Performance Analysis
Explainable ML
One liner to create global and local explanation based on SHAP that will work across various classification and regression algorithms.
Hyperparameter Optimization
Parallel coordinates plot for hyperparameter search space diagnostics.
Features
Algorithm Agnostic ML Pipeline
- Unified interface for any scikit-learn compatible model
- Consistent workflow across classification and regression tasks
- Automated report generation with comprehensive metrics and visuals
- Production-ready with MLflow integration for deployment
- Simplified handoff between experimentation and production
Intelligent Preprocessing
- Streamlined feature preprocessing with smart defaults and minimal code
- Automatic feature analysis with data-driven encoding recommendations
- Integrated target encoding with visualization for optimal smoothing selection
- Feature filtering based on information theory metrics (mutual information)
- Handles the full preprocessing pipeline from missing values to feature encoding
- Seamless integration with scikit-learn and MLflow for production deployment
Model Optimization
- Efficient hyperparameter tuning with Optuna's TPE sampler
- Smart early stopping with patient pruning to save computation resources
- Cross-validation with variance penalty to prevent overfitting
- Parallel coordinates visualization for search history tracking and parameter space diagnostics
- Automated threshold optimization with business-focused F-beta scoring
- Flexible metric selection for optimization
- Classification: AUC (default), F1, accuracy
- Regression: RMSE (default), NRMSE, MAPE
Performance Analysis
- Comprehensive metric tracking
- Classification: AUC, F1, Fbeta, precision, recall
- Regression: RMSE, MAE, R2, adjusted R2, MAPE
- Performance visualization
- Classification: ROC_AUC curve, Precision-recall curve
- Regression: Residual analysis, Prediction error plot
- Model interpretability
- Global feature importance
- Local prediction explanations
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlarena-0.2.1.tar.gz.
File metadata
- Download URL: mlarena-0.2.1.tar.gz
- Upload date:
- Size: 26.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.11.0-1012-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5502cd5d5dfa78de5a17183f64bd7e7d2b9ee5db40aabc7ff25a98e03ef42d6
|
|
| MD5 |
5986846b2dca056c8ac6a4645de9754a
|
|
| BLAKE2b-256 |
cae1842f2fff786a4b421a1eb9800bc2575bb1007d4fbe0d58637ffeb5bdabdf
|
File details
Details for the file mlarena-0.2.1-py3-none-any.whl.
File metadata
- Download URL: mlarena-0.2.1-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.12 Linux/6.11.0-1012-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32232cb69586f949b95bc0d6ecd9b19c256e959b107c05c07976d96080e66842
|
|
| MD5 |
f4b6c141636eb565e7dec61f2109b0b6
|
|
| BLAKE2b-256 |
a3ad20c37d1b08dfed73bebdcf39662edd71790d0a54266393701a0b4b62219a
|