Skip to main content

Quickly compare machine learning models across libraries and datasets.

Project description

MLCompare Logo

Supported Python Versions PyPI - Version PyPI - License Pepy Total Downlods
Read the Docs GitHub Actions Workflow Status GitHub Actions status (MacOS Unit Tests) Code Coverage

MLCompare is a Python package for running model comparison pipelines, with the aim of being both simple and flexible. It supports multiple popular ML libraries, retrieval from multiple online dataset repositories, common data processing steps, and results visualization. Additionally, it allows for using your own models and datasets within the pipelines.

Libraries
Datasets
Data Processing
  • Scikit-learn
  • XGBoost
  • Kaggle
  • OpenML
  • Hugging Face
  • locally saved
  • train-test split
  • drop columns
  • handle NaNs: drop | forward-fill | backward-fill
  • encoders: OneHot | Ordinal | Target | Label
  • scalers: Standard | MinMax | MaxAbs | Robust
  • transformers: Quantile | Power | Normalizer

Installing

It is recommended to create a new virtual environment. Example with Conda:

conda create -n compare_env python==3.11.9
conda activate compare_env

Install this library with pip:

pip install mlcompare

Note that for MacOS, both XGBoost and LightGBM require libomp. It can be installed with Homebrew:

brew install libomp

A Simple Example

Running a pipeline with multiple datasets and models is done by creating a list of dictionaries for each and providing them to a pipeline function.

The below example downloads a dataset from OpenML and Kaggle, one-hot encodes some of the columns in the Kaggle dataset, and trains and evaluates a Random Forest and XGBoost model on them.

import mlcompare

datasets = [
    {
        "type": "openml",
        "id": 8,
        "target": "drinks",
    },
    {
        "type": "kaggle",
        "user": "gorororororo23",
        "dataset": "plant-growth-data-classification",
        "file": "plant_growth_data.csv",
        "target": "Growth_Milestone",
        "oneHotEncode": ["Soil_Type", "Water_Frequency", "Fertilizer_Type"],
    }
]

models = [
    {
        "library": "sklearn",
        "name": "RandomForestRegressor",
    },
    {
        "library": "xgboost",
        "name": "XGBRegressor",
        "params": {"num_leaves": 40, "n_estimators": 200}
    }
]

mlcompare.full_pipeline(datasets, models, "regression")

In the case of the XGBoost model some non-default parameter values were used.

Planned Additions

Version 1.3

  • LightGBM support
  • CatBoost support
  • Model results graphing and visualization
  • Improved documentation
  • Support for presplit data

Version 1.4

  • PyTorch support
  • TensorFlow support
  • Additional dataset sources
  • Built-in model and dataset collections for quick testing of similar model types/datasets
  • Optional pipeline caching
  • Optional trained model saving

Version 1.5

  • S3 Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlcompare-1.2.2.tar.gz (40.0 kB view details)

Uploaded Source

Built Distribution

mlcompare-1.2.2-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file mlcompare-1.2.2.tar.gz.

File metadata

  • Download URL: mlcompare-1.2.2.tar.gz
  • Upload date:
  • Size: 40.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for mlcompare-1.2.2.tar.gz
Algorithm Hash digest
SHA256 69c06d431a906a6e452b6360fafa0e7d3a844beb45d3da3722f0e9f7ad10960c
MD5 3aca051fc0691082af70d8fca9c45bed
BLAKE2b-256 cbc1538774264211f05b427f2341b4014e4f8fff3d93fe7e290622b81e57c597

See more details on using hashes here.

File details

Details for the file mlcompare-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: mlcompare-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for mlcompare-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1a2dd307e96fdfa7b03d127f1816f81bbca7481161dbdf51659516bf7fcef342
MD5 3516057d13b2680a12d1809ba859b98b
BLAKE2b-256 2b759d4ac72dc0af6f18df1e2f457a2d56c4f09970f6880e15decb78939a7098

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page