Skip to main content

Quickly compare machine learning models across libraries and datasets.

Project description

MLCompare Logo

Supported Python Versions PyPI - Version PyPI - License
Read the Docs GitHub Actions Workflow Status GitHub Actions status (MacOS Unit Tests) Code Coverage

This library is still in early development. Expect many more features to come :D

MLCompare is a Python package for running model comparison pipelines, with the aim of being both simple and flexible. It supports multiple popular ML libraries, retrieval from multiple online dataset repositories, common data processing steps, and results visualization. Additionally, it allows for using your own models and datasets within the pipelines.

Libraries
Datasets
Data Processing
  • Scikit-learn
  • XGBoost
  • Kaggle
  • Hugging Face
  • OpenML
  • Locally saved
  • Encode: One-hot | Label
  • Drop Columns

Installing

pip install mlcompare

Note that for MacOS, both XGBoost and LightGBM require libomp. It can be installed with Homebrew:

brew install libomp

A Simple Example

Running a pipeline with multiple models and datasets is done by making list of dictionaries for each and providing them to a pipeline function.

The below example downloads a dataset from OpenML and Kaggle, one-hot encodes some of the columns in the Kaggle dataset, and trains and evaluates a Random Forest and XGBoost model on them.

import mlcompare

datasets = [
    {
        "type": "openml",
        "id": 8,
        "target": "drinks",
    },
    {
        "type": "kaggle",
        "user": "gorororororo23",
        "dataset": "plant-growth-data-classification",
        "file": "plant_growth_data.csv",
        "target": "Growth_Milestone",
        "onehotEncode": ["Soil_Type", "Water_Frequency", "Fertilizer_Type"],
    }
]

models = [
    {
        "library": "sklearn",
        "name": "RandomForestRegressor",
    },
    {
        "library": "xgboost",
        "name": "XGBRegressor",
        "params": {"num_leaves": 40, "n_estimators": 200}
    }
]

mlcompare.full_pipeline(datasets, models)

In the case of the XGBoost model we passed in our own parameter values rather than using the defaults.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlcompare-1.0.1.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

mlcompare-1.0.1-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file mlcompare-1.0.1.tar.gz.

File metadata

  • Download URL: mlcompare-1.0.1.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for mlcompare-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b8637385ecf213dcf10cd2fe3aac5ece6b6ebaad1541ebc5becdc68b2d8e7452
MD5 ed3ec08bd2ce2e57b3500d839172d286
BLAKE2b-256 077a34b53f4bbf2e6c1258543fd4cd414f2856cf093788a868699a6bc6b1d5a6

See more details on using hashes here.

File details

Details for the file mlcompare-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: mlcompare-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for mlcompare-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53d7d176f256c4785bb6b830a4e6ecc3be90bfcbac3e733c121b056ab9c3b98c
MD5 a74371ec27d9fee55374ee9a29d32cbd
BLAKE2b-256 1a3f5980cd02c28f7e9252a4453b92d54eacfdfa41191cef47273352736b03f5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page