Python package for automated hyperparameter-optimization of common machine-learning algorithms

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.11

Project description

automl: Automated Machine Learning

Intro

automl is a python project focussed on automating much of the machine learning efforts encountered in zero-dimensional regression and classification (and thus not multidimensional data such as for a CNN). It relies on existing Python packages Sci-Kit Learn, Optuna and model specific packages LightGBM, CatBoost and XGBoost.

automl works by assessing the performance of various machine-learning models for a set number of trials over a pre-defined range of hyperparameters. During succesive trials the hyperparameters are optimized following a user-defined methodology (the default optimisation uses Bayesian search). Unpromising trials are stopped (pruned) early by assessing performance on an incrementally increasing fraction of training data, saving computational resources. Hyperparameter optimization trials are stored locally on disk, allowing the training to be picked up after interuption. The best trials of the defined models are reloaded and combined, or stacked, to form a final model. This final model is assessed and, due to the nature of stacking, tends to outperform any of its constituting models.

automl contains several additional functionalities beyond the hyperoptimization and stacking of models:

scaling of the input X-matrix (tested for on default)
normal transformation of the y-matrix (tested for on default)
PCA compression
spline transformation
polynomial expansion
categorical feature support (nominal and ordinal)
bagging of weak models in addition to optimized models
multithreading
feature-importance analyses with shap

Setup

Method 1: pip install

Create a new environment to prevent pip install from breaking anything. Include a Python version 3.11

conda create -n ENVNAME -c conda-forge python=3.11

Activate new environment

conda activate ENVNAME

Pip install

python3 -m pip install py-automl-lib

Optionally include the shap package for feature-importance analyses (see example_notebook.ipynb chapter 7.)

python3 -m pip install py-automl-lib[shap]

Method 2: cloning

Clone the repository

git clone https://github.com/owenodriscoll/AutoML

Navigate to the cloned local repository and create the conda environment with all requirement packages

conda env create --name ENVNAME --file environment.yml

Activate new environment

conda activate ENVNAME

Having created an environment with all dependencies, install AutoML:

pip install git+https://github.com/owenodriscoll/AutoML.git

Use

For a more detailed example checkout examples/example_notebook.ipynb

Minimal use case regression:

from sklearn.metrics import r2_score
from automl import AutomatedRegression

X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, random_state=42)

regression = AutomatedRegression(
    y=y,
    X=X,
    n_trial=10,
    timeout=100
    metric_optimise=r2_score,
    optimisation_direction='maximize',
    models_to_optimize=['bayesianridge', 'lightgbm'],
    )
    
regression.apply()
regression.summary

Expanded options use case regression:

from optuna.samplers import TPESampler
from optuna.pruners import HyperbandPruner
from sklearn.metrics import r2_score
from sklearn.model_selection import KFold
from automl import AutomatedRegression

X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, random_state=42)

# -- adding categorical features
df_X = pd.DataFrame(X)
df_X['nine'] = pd.cut(df_X[9], bins=[-float('Inf'), -3, -1, 1, 3, float('Inf')], labels=['a', 'b', 'c', 'd', 'e'])
df_X['ten'] = pd.cut(df_X[9], bins=[-float('Inf'), -1, 1, float('Inf')], labels=['A', 'B', 'C'])
df_y = pd.Series(y)

regression = AutomatedRegression(
    y=df_y,
    X=df_X,
    test_frac=0.2,
    fit_frac=[0.2, 0.4, 0.6, 1],
    n_trial=50,
    timeout=600,
    metric_optimise=r2_score,
    optimisation_direction='maximize',
    cross_validation=KFold(n_splits=5, shuffle=True, random_state=42),
    sampler=TPESampler(seed=random_state),
    pruner=HyperbandPruner(min_resource=1, max_resource='auto', reduction_factor=3),
    reload_study=False,
    reload_trial_cap=False,
    write_folder='/auto_regression_test',
    models_to_optimize=['bayesianridge', 'lightgbm'],
    nominal_columns=['nine'],
    ordinal_columns=['ten'],
    pca_value=0.95,
    spline_value={'n_knots': 5, 'degree':3},
    poly_value={'degree': 2, 'interaction_only': True},
    boosted_early_stopping_rounds=100,
    n_weak_models=5,
    random_state=42,
    )

regression.apply()
regression.summary

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.11

Release history Release notifications | RSS feed

2.2.8

Jul 11, 2025

2.2.7

Apr 22, 2025

2.2.6

Jan 27, 2025

2.2.5

Jan 17, 2025

2.2.4

Jan 16, 2025

2.2.3

Jan 14, 2025

This version

2.2.2

May 4, 2024

2.2.1

Jan 30, 2024

2.2

Jan 27, 2024

0.0.0

Jan 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_automl_lib-2.2.2.tar.gz (26.3 kB view details)

Uploaded May 4, 2024 Source

Built Distribution

py_automl_lib-2.2.2-py3-none-any.whl (28.9 kB view details)

Uploaded May 4, 2024 Python 3

File details

Details for the file py_automl_lib-2.2.2.tar.gz.

File metadata

Download URL: py_automl_lib-2.2.2.tar.gz
Upload date: May 4, 2024
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for py_automl_lib-2.2.2.tar.gz
Algorithm	Hash digest
SHA256	`8a273af485bc13ca94586b1986b1acdd467dbe7b7d37c8837b67f715bef3e027`
MD5	`8bb0af094f63814f971e8cad4da9942a`
BLAKE2b-256	`2239c7f8efe2978e52ed87d34a555f844d97537c8f048a2da2bf1b4f6860eb53`

See more details on using hashes here.

File details

Details for the file py_automl_lib-2.2.2-py3-none-any.whl.

File metadata

Download URL: py_automl_lib-2.2.2-py3-none-any.whl
Upload date: May 4, 2024
Size: 28.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for py_automl_lib-2.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c4165ca54011249c8ac02981efd1bf2c2186ba53ed2a0c22721555d26dcf152`
MD5	`ebb5271f8cbb4f71c010f727a4e792cf`
BLAKE2b-256	`07f40cef5915c5d98fa778dacee48e3e25a4bf7173385dd1b3f51582a6da6199`

See more details on using hashes here.

py-automl-lib 2.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

automl: Automated Machine Learning

Intro

Setup

Method 1: pip install

Method 2: cloning

Use

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes