Skip to main content

Holistic and No Code Auto Machine Learning

Project description

HoNCAML

Introduction

HoNCAML (Holistic No Code Automated Machine Learning) is a tool aimed to run automated machine learning pipelines, and specifically focused on finding the best model and hyperparameters for the problem at hand.

Following the no code paradigm, no Python knowledge is needed. There are two ways to define pipelines:

  • Through the Graphical User Interface
  • Through YAML configuration files

Pipelines

There are three types of provided pipelines.

Train

Train a specific model with the hyperparameters specified.

  • Input: A dataset for the training.
  • Output: The model object stored to disk.

Predict

Use a model to generate predictions for a specific dataset.

  • Input: A dataset for the test, together with a model object.
  • Output: A tabular file with the predictions.

Benchmark

Search for the best model and hyperparameters for the dataset at hand.

  • Input: A dataset for the benchmark.
  • Output: Main output is a configuration file with the best model and hyperparameters, and a tabular file with the results for all configurations tested.

Focus

HoNCAML has been designed having the following aspects in mind:

  • Ease of use
  • Modularity
  • Extensibility
  • Simpler is better

Users

HoNCAML does not assume any kind of technical knowledge, but at the same time it is designed to be extended by expert people. Therefore, its user base may range from:

  • Basic users: In terms of programming experience and/or machine learning knowledge. It would be possible for them to get results in an easy way.

  • Advanced users: It is possible to customize experiments in order to adapt to a specific use case that may be needed by an expert person.

Support

Regarding each of the following concepts, HoNCAML supports specific sets of them; nevertheless, due to its nature, extend the library further should be not only feasible, but intuitive.

Data structure

For now only data with tabular format is supported. However, HoNCAML provides special preprocessing methods if needed:

  • Normalization
  • One hot encoding of categorical features

Problem type

At this moment, the following types of problems are supported:

  • Regression
  • Classification

Model type

Regarding available models, the following are supported:

  • Sklearn models (ML)
  • Pytorch models (DL)

Requirements

To use HoNCAML, it is required to have Python >= 3.10.

Install

To install HoNCAML, run: pip install honcaml

Command line execution

Quick execution with example data

For a quick usage with example data and configuration, just run:

honcaml -e {example_directory}

This would create a directory containing sample data and configuration to see how HoNCAML works in a straightforward manner. Just enter the specified directory: cd {example_directory} and run one of the pipelines located in files directory. For example, a benchmark for a classification task:

honcaml -c files/classification_benchmark.yaml

Standard execution

To start a HoNCAML execution for a particular pipeline, first it is needed to generate the configuration file for it. It may be easy to start with a template, which is provided by the CLI itself.

In case a basic configuration file is enough, with the minimum required options, the following should be invoked:

honcaml -b {config_file} -t {pipeline_type}

On the other hand, there is the possibility of generating an advanced configuration file, with all the supported options:

honcaml -a {config_file} -t {pipeline_type}

In both cases, {config_file} should be a path to the file containing the configuration in yaml extension, and {pipeline_type} one of the supported: train, predict or benchmark.

When having a filled configuration file to run the pipeline, it is just a matter of executing it:

honcaml -c {config_file}

For example, the following basic configuration would train a default model for classification and store it.

```yaml
global:
  problem_type: classification

steps:
  data:
    extract:
      filepath: data/dataset.csv
      target: class
    transform:

  model:
    transform:
      fit:
    load:
      filepath: default_model.sav
```

GUI execution

To run the HoNCAML GUI locally in a web browser tab, run the following command:

honcaml -g

It allows to execute HoNCAML by interactively selecting pipeline options, although it is possible to run a pipeline by uploading its configuration file as well.

Contribute

All contributions are more than welcome! For further information, please refer to the contribution documentation.

Bugs

If you find any bug, please check if there is any existing issues, and if not, open a new one with a clear description.

Contact

Should you have any inquiry regarding the library or its development, please contact the Applied Machine Learning team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

honcaml-0.2.1.tar.gz (132.9 kB view details)

Uploaded Source

Built Distribution

honcaml-0.2.1-py3-none-any.whl (167.4 kB view details)

Uploaded Python 3

File details

Details for the file honcaml-0.2.1.tar.gz.

File metadata

  • Download URL: honcaml-0.2.1.tar.gz
  • Upload date:
  • Size: 132.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for honcaml-0.2.1.tar.gz
Algorithm Hash digest
SHA256 073d519b82c8547cb4c787f55833c30658e9140871d7774d474becad32c0ca45
MD5 2d4fed969698ed5b4d0a49069f7e123c
BLAKE2b-256 8f1ffd04bf670778ae7d48d5d8c9582bef4d40911c8b57edd2a20b8765981211

See more details on using hashes here.

File details

Details for the file honcaml-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: honcaml-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 167.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for honcaml-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 815d4271d36c2cee38a6e71e3771b3001145cc9830b233b999d6122361ef141e
MD5 aaf68164eef63a6a678728b29bad630c
BLAKE2b-256 01b13f8cc3d2c8da700980362f80f571de242a759935cf5cc75d6f14e21b4a13

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page