Quickly compare machine learning models across libraries and datasets.
Project description
MLCompare is a Python package for running model comparison pipelines, with the aim of being both simple and flexible. It supports multiple popular ML libraries, retrieval from multiple online dataset repositories, common data processing steps, and results visualization. Additionally, it allows for using your own models and datasets within the pipelines.
Libraries |
Datasets |
Data Processing |
---|---|---|
|
|
|
Installing
It is recommended to create a new virtual environment. Example with Conda:
conda create -n compare_env python==3.11.9
conda activate compare_env
Install this library with pip:
pip install mlcompare
Note that for MacOS, both XGBoost and LightGBM require libomp
. It can be installed with Homebrew:
brew install libomp
A Simple Example
Running a pipeline with multiple datasets and models is done by creating a list of dictionaries for each and providing them to a pipeline function.
The below example downloads a dataset from OpenML and Kaggle, one-hot encodes some of the columns in the Kaggle dataset, and trains and evaluates a Random Forest and XGBoost model on them.
import mlcompare
datasets = [
{
"type": "openml",
"id": 8,
"target": "drinks",
},
{
"type": "kaggle",
"user": "gorororororo23",
"dataset": "plant-growth-data-classification",
"file": "plant_growth_data.csv",
"target": "Growth_Milestone",
"oneHotEncode": ["Soil_Type", "Water_Frequency", "Fertilizer_Type"],
}
]
models = [
{
"library": "sklearn",
"name": "RandomForestRegressor",
},
{
"library": "xgboost",
"name": "XGBRegressor",
"params": {"num_leaves": 40, "n_estimators": 200}
}
]
mlcompare.full_pipeline(datasets, models, "regression")
In the case of the XGBoost model some non-default parameter values were used.
Planned Additions
Version 1.3
- LightGBM support
- CatBoost support
- Model results graphing and visualization
- Improved documentation
- Support for presplit data
Version 1.4
- PyTorch support
- TensorFlow support
- Additional dataset sources
- Built-in model and dataset collections for quick testing of similar model types/datasets
- Optional pipeline caching
- Optional trained model saving
Version 1.5
- S3 Support
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlcompare-1.2.0.tar.gz
.
File metadata
- Download URL: mlcompare-1.2.0.tar.gz
- Upload date:
- Size: 39.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f46e87191a80736d48f0252b42213155f2ad47cc4ce9f8adbfd955640147aa5b |
|
MD5 | 0fe3eca7984b2bd801261f3ce9d15c07 |
|
BLAKE2b-256 | ceb20abd3dc75ade025987f85f113203230cef7899c1da59679384749ee01f98 |
File details
Details for the file mlcompare-1.2.0-py3-none-any.whl
.
File metadata
- Download URL: mlcompare-1.2.0-py3-none-any.whl
- Upload date:
- Size: 26.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c395a05d8d061a68d1b45a12757f207b50c5133a47a996d1010b507517324e96 |
|
MD5 | 81b15cb592d2555d17589ea487d5f7c3 |
|
BLAKE2b-256 | 3cf2a7d3930d34bd47810ce21bc7b51ada2fc87c26e5d3c9c00cd94747734fc2 |