hypergbm

A full pipeline AutoML tool integrated various GBM models

These details have not been verified by PyPI

Project links

Homepage

Project description

We Are Hiring！

Dear folks, we are offering challenging opportunities located in Beijing for both professionals and students who are keen on AutoML/NAS. Come be a part of DataCanvas! Please send your CV to yangjian@zetyun.com. (Application deadline: TBD.)

What is HyperGBM

HyperGBM is a full pipeline automated machine learning (AutoML) toolkit designed for tabular data. It covers the complete end-to-end ML processing stages, consisting of data cleaning, preprocessing, feature generation and selection, model selection and hyperparameter optimization.

Overview

HyperGBM optimizes the end-to-end ML processing stages within one search space, which differs from most existing AutoML approaches that only tackle partial stages, for instance, hyperparameter optimazation. This full pipeline optimization process is very similar to a sequential decision process (SDP). Therefore, HyperGBM utilizes reinforcement learning, Monte Carlo Tree Search, evolution algorithm combined with a meta-learner to efficiently solve the pipeline optimization problem.

HyperGBM, as indicated in the name, involves several gradient boosting tree models (GBM), namely, XGBoost, LightGBM and Catboost. What's more, it could access the Hypernets, a general automated machine learning framework, and introduce its advanced characteristics in data cleaning, feature engineering and model ensemble. Additionally, the search space representation and search algorithm inside Hyper GBM are also supported by Hypernets.

Installation

Conda

Install HyperGBM with conda from the channel conda-forge:

conda install -c conda-forge hypergbm

Pip

Install HyperGBM with different pip options:

Typical installation:

pip install hypergbm

To run HyperGBM in JupyterLab/Jupyter notebook, install with command:

pip install hypergbm[notebook]

To support experiment visualization base on web, install with command:

pip install hypergbm[board] # Temporarily unavailable in version 0.3.x

To run HyperGBM in distributed Dask cluster, install with command:

pip install hypergbm[dask]

To support dataset with simplified Chinese in feature generation,
- Install jieba package before running HyperGBM.
- OR install with command:

pip install hypergbm[zhcn]

Install all above with one command:

pip install hypergbm[all]

Examples

Use HyperGBM with Python

Users can quickly create and run an experiment with make_experiment, which only needs one required input parameter train_data. The example shown below is using the blood dataset as train_data from hypernet.tabular. If the target column of the dataset is not y, it needs to be manually set through the argument target.

An example codes:

from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils

train_data = dsutils.load_blood()
experiment = make_experiment(train_data, target='Class')
estimator = experiment.run()
print(estimator)

This training experiment returns a pipeline with two default steps, data_clean and estimator. In particular, the estimator returns a final model which consists of various models. The outputs：

Pipeline(steps=[('data_clean',
                 DataCleanStep(...),
                ('estimator',
                 GreedyEnsemble(...)])

To see more examples, please read Quick Start and Examples.

Use HyperGBM with Command line tools

Hypergbm also supports command line tools to perform model training, evaluation and prediction. The following codes enable the user to view command line help:

hypergbm -h

usage: hypergbm [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug]
                [--verbose VERBOSE] [-v] [--enable-gpu ENABLE_GPU] [-gpu] 
                [--enable-dask ENABLE_DASK] [-dask] [--overload OVERLOAD]
                {train,evaluate,predict} ...

The example of training a model for dataset blood.csv is shown below:

hypergbm train --train-file=blood.csv --target=Class --model-file=model.pkl

For more details, please read Quick Start.

GPU Acceleration

Hypergbm supports full pipeline GPU acceleration, including all steps from data processing to model training. In our experiments, we got a 50x performance improvement! Most importantly, the model trained on GPU could be deployed to the environment without GPU hardware and software (e.g.,CUDA and cuML), which greatly reduces the cost of model deployment.

Gpu Acceleration

Documents

HyperGBM related projects

Hypernets: A general automated machine learning (AutoML) framework.
HyperGBM: A full pipeline AutoML tool integrated various GBM models.
HyperDT/DeepTables: An AutoDL tool for tabular data.
HyperTS: A full pipeline AutoML&AutoDL tool for time series datasets.
HyperKeras: An AutoDL tool for Neural Architecture Search and Hyperparameter Optimization on Tensorflow and Keras.
HyperBoard: A visualization tool for Hypernets.
Cooka: Lightweight interactive AutoML system.

DataCanvas AutoML Toolkit

Citation

If you use HyperGBM in your research, please cite us as follows:

Jian Yang, Xuefeng Li, Haifeng Wu. HyperGBM: A full pipeline AutoML tool integrated with various GBM models. https://github.com/DataCanvasIO/HyperGBM. 2020. Version 0.2.x.

BibTex:

@misc{hypergbm,
  author={Jian Yang, Xuefeng Li, Haifeng Wu},
  title={{HyperGBM}: { A Full Pipeline AutoML Tool Integrated With Various GBM Models}},
  howpublished={https://github.com/DataCanvasIO/HyperGBM},
  note={Version 0.2.x},
  year={2020}
}

DataCanvas

HyperGBM is an open source project created by DataCanvas.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.2

Feb 23, 2024

0.3.1

Dec 13, 2023

0.3.0

Jul 17, 2023

0.2.5.7

Nov 26, 2022

0.2.5.6

Oct 10, 2022

0.2.5.5

Sep 7, 2022

0.2.5.4

May 31, 2022

0.2.5.3

Mar 27, 2022

0.2.5.2

Mar 8, 2022

0.2.5.1

Mar 2, 2022

0.2.5

Mar 2, 2022

0.2.3.2

Dec 16, 2021

0.2.3.1

Oct 18, 2021

0.2.3

Aug 18, 2021

0.2.2

Mar 4, 2021

0.2.1

Feb 4, 2021

0.2.0

Feb 4, 2021

0.1.2

Nov 30, 2020

0.1.1

Oct 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypergbm-0.3.2.tar.gz (3.3 MB view details)

Uploaded Feb 23, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hypergbm-0.3.2-py3-none-any.whl (3.4 MB view details)

Uploaded Feb 23, 2024 Python 3

File details

Details for the file hypergbm-0.3.2.tar.gz.

File metadata

Download URL: hypergbm-0.3.2.tar.gz
Upload date: Feb 23, 2024
Size: 3.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for hypergbm-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`e27107836faccc29e4464b2977c10241049f6a5652facd2f0c3c1d66197ac009`
MD5	`4f2741a8f58878b27648fa5adda7c679`
BLAKE2b-256	`304dc657950d6f534beba6d22a7cb264830754e2c0f7372f00cc44bfb68ad37e`

See more details on using hashes here.

File details

Details for the file hypergbm-0.3.2-py3-none-any.whl.

File metadata

Download URL: hypergbm-0.3.2-py3-none-any.whl
Upload date: Feb 23, 2024
Size: 3.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for hypergbm-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`471a9a4bf5a55e1d33812fac661c0c57351e4c1c63a0715d0770fdf60e41ce69`
MD5	`a3d38624d1df6e006551f2ace9eea005`
BLAKE2b-256	`180d0141cdc3dbc154d027ac142993fef019bd5a1c37f0e35d6ed0ae40a4e552`

See more details on using hashes here.

hypergbm 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

We Are Hiring！

What is HyperGBM

Overview

Installation

Conda

Pip

Examples

GPU Acceleration

Documents

HyperGBM related projects

Citation

DataCanvas

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes