Skip to main content

A full pipeline AutoML tool integrated various GBM models

Project description

HyperGBM

Python Versions Downloads PyPI Version

Doc | 中文

We Are Hiring!

Dear folks, we are opening several precious positions based in Beijing both for professionals and interns avid in AutoML/NAS, please send your resume/cv to yangjian@zetyun.com. (Application deadline: TBD.)

What is HyperGBM

HyperGBM is a library that supports full-pipeline AutoML, which completely covers the end-to-end stages of data cleaning, preprocessing, feature generation and selection, model selection and hyperparameter optimization.It is a real-AutoML tool for tabular data.

Overview

Unlike most AutoML approaches that focus on tackling the hyperparameter optimization problem of machine learning algorithms, HyperGBM can put the entire process from data cleaning to algorithm selection in one search space for optimization. End-to-end pipeline optimization is more like a sequential decision process, thereby HyperGBM uses reinforcement learning, Monte Carlo Tree Search, evolution algorithm combined with a meta-learner to efficiently solve such problems.

As the name implies, the ML algorithms used in HyperGBM are all GBM models, and more precisely the gradient boosting tree model, which currently includes XGBoost, LightGBM and Catboost.

The underlying search space representation and search algorithm in HyperGBM are powered by the Hypernets project a general AutoML framework.

Tutorial

Installation

Insall HyperGBM with pip command:

pip install hypergbm

Optional, to run HyperGBM in JupyterLab notebooks, install HyperGBM and JupyterLab with command:

pip install hypergbm[notebook]

Optional, to support dataset with simplified Chinese in feature generation, install jieba package before run HyperGBM, or install HyperGBM with command:

pip install hypergbm[zhcn]

Optional, install all HyperGBM components and dependencies with one command:

pip install hypergbm[all]

Examples

User can create experiment instance with make_experiment and run it quickly。train_data is the only required parameter, all others are optional. The target is also required if your target feature name isn't y

Codes:

from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils

train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)

Outputs:

Pipeline(steps=[('data_clean',
                 DataCleanStep(...),
                ('estimator',
                 GreedyEnsemble(...)])

Hypergbm also provides command line tools to train models and predict data:

hypergm -h

usage: hypergbm [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug]
                [--verbose VERBOSE] [-v] [--enable-dask ENABLE_DASK] [-dask]
                [--overload OVERLOAD]
                {train,evaluate,predict} ...

For example, train model for dataset blood.csv:

hypergbm train --train-file=blood.csv --target=Class --model-file=model.pkl

Hypernets related projects

  • HyperGBM: A full pipeline AutoML tool integrated various GBM models.
  • HyperDT/DeepTables: An AutoDL tool for tabular data.
  • HyperKeras: An AutoDL tool for Neural Architecture Search and Hyperparameter Optimization on Tensorflow and Keras.
  • Cooka: Lightweight interactive AutoML system.
  • Hypernets: A general automated machine learning framework.

DataCanvas AutoML Toolkit

Documents

DataCanvas

HyperGBM is an open source project created by DataCanvas.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypergbm-0.2.3.1.tar.gz (47.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hypergbm-0.2.3.1-py3-none-any.whl (2.9 MB view details)

Uploaded Python 3

File details

Details for the file hypergbm-0.2.3.1.tar.gz.

File metadata

  • Download URL: hypergbm-0.2.3.1.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11

File hashes

Hashes for hypergbm-0.2.3.1.tar.gz
Algorithm Hash digest
SHA256 919e96757d2c29d8760f291516f2c1eb1f8eb401ecb15da1580f76629e5eb162
MD5 a32d2551d77c389246155dfc87e2392b
BLAKE2b-256 b2f368acf62e332645c8f8f1a4c51413faa74f630ed90dcfc3e67ec771dccb13

See more details on using hashes here.

File details

Details for the file hypergbm-0.2.3.1-py3-none-any.whl.

File metadata

  • Download URL: hypergbm-0.2.3.1-py3-none-any.whl
  • Upload date:
  • Size: 2.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.11

File hashes

Hashes for hypergbm-0.2.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c32c12300e4c86ec5caf6d555383b855d37a6dbc57a6134db68c31ed18c6ef76
MD5 f53c64e08efbe22f4a4b2c1dbb51f02d
BLAKE2b-256 2b68d82d8a6c7d16de50e068dea0fc1b83cbb30e0b708e077f4a087f2e8756c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page