Skip to main content

A Democratized lightweight and transparent AutoML framework

Project description

License Stars Forks Last Commit Commit Activity Docs

eZAutoML

eZAutoML Logo

Overview

eZAutoML is a framework designed to make Automated Machine Learning (AutoML) accessible to everyone. It provides an incredible easy to use interface based on Scikit-Learn API to build modelling pipelines with minimal effort.

The framework is built around a few core concepts:

  1. Optimizers: Black-box optimization methods for hyperparameters.
  2. Easy Tabular Pipelines: Simple domain-specific language to describe pipelines for preprocessing and model training.
  3. Scheduling: Work in progress; this feature enables horizontal scalability from a single computer to datacenters by using airflow executors.

Installation

Package Distribution

The latest version of eZAutoML can be installed via PyPI or from source.

pip install ezautoml
ezautoml --help

Install from source

To install from source, you can clone this repo and install with pip:

pip install -e .

Usage

Command Line Interface

Usage:

ezautoml --dataset <path_to_data> --target <target_name> --task <classification|regression> --models <model1,model2,...> --cv <folds> --output <path_to_output>

Options:

  • dataset: Path to the dataset file (CSV, parquet...)
  • target: The target column name for prediction
  • task: Task type: classification or regression
  • search: Black-box optimization algorithm to perform
  • models: Comma-separated list of models to use (e.g., lr,rf,xgb). Use initials!
  • cv: Number of cross-validation folds (if needed)
  • output: Directory to save the output models/results
  • trials: Maximum number of trials inside an optimiation algorithm
  • preprocess: Whether to perform minimal preprocessing (Scaling, Encoding...) or not
  • verbose: Increase logging verbosity
  • version: Show the current version

For more detailed help, use:

ezautoml --help

There are future features that are still a work-in-progress and will be enabled in the future such as scheduling, metalearning, pipelines...

Python Script

You can also use eZAutoML within Python scripts (though this feature is still being developed). This will allow you to work through Python code or via custom pipelines in the future.

???

WIP

WIP TODO List for eZAutoML

1. Core System Setup

  • Implement Dataset Loading (datasets.py)

    • Build a utility to load datasets from various formats (CSV, Parquet, etc.).
    • Implement functionality to split datasets into train and test sets.
  • Preprocessing (preprocess.py)

    • Implement basic preprocessing such as:
      • Feature scaling (StandardScaler)
      • Label encoding for classification tasks
      • Handling missing values (if necessary)
    • Optional: Extend to more advanced preprocessing in the future.

2. Model Implementation

  • Model Definitions (models.py)

    • Implement a list of models:
      • SVM, RandomForest, XGBoost, etc.
      • Ensure models can be easily swapped based on the user's request in CLI (--models flag).
  • Search Strategy (search.py)

    • Implement the abstract optimizer class, and separate search strategies such as:
      • Random Search: Use for hyperparameter tuning.
      • Grid Search: For exhaustive search of hyperparameters.
    • Provide flexibility to add new strategies later.

3. Model Evaluation

  • Evaluator (evaluation.py)

    • Implement cross-validation to assess model performance.
    • Support various metrics (accuracy, F1 score, etc.) based on the task (classification/regression).
  • Leaderboard (reporting.py)

    • Track and store model performance (accuracy, metrics).
    • Build a leaderboard that ranks models based on their cross-validation score.

4. Optimization System

  • Abstract Optimizer (search.py)

    • Implement a base class for optimizers, handling setup and execution of hyperparameter search.
    • Design the optimizer to integrate with different search strategies (Random Search, Grid Search).
  • Random Search Optimizer

    • Implement random hyperparameter search strategy.
    • Randomly sample hyperparameters from predefined search spaces.
    • Use the evaluator to assess performance during each trial.

5. History Tracking

  • Build History Logging System (history.py)
    • Implement a system to store trial results (model parameters, validation scores, etc.).
    • Provide an easy way to retrieve and analyze previous experiment results.

6. Reporting and Output

  • Reporting (reporting.py)
    • Create functionality to log experiment results.
    • Optionally generate visualization such as bar plots for leaderboard.
    • Save reports and models to the specified output directory.

7. Configuration System

  • Config Management (config.py)
    • Define default search spaces for hyperparameters.
    • Allow easy configuration of model hyperparameters and search spaces.
    • Ensure flexibility for future extension.

Contributing

We welcome contributions to eZAutoML! If you'd like to contribute, please fork the repository and submit a pull request with your changes. For detailed information on how to contribute, please refer to our contributing guide.

License

eZAutoML is licensed under the BSD 3-Clause License. See the LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezautoml-0.1.3.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ezautoml-0.1.3-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file ezautoml-0.1.3.tar.gz.

File metadata

  • Download URL: ezautoml-0.1.3.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ezautoml-0.1.3.tar.gz
Algorithm Hash digest
SHA256 62ee994443071081649b39314e5c0478975178a472eeb5387f93fb93415644a3
MD5 8f8027856897cd67eb4c21c3140c5ef5
BLAKE2b-256 91fba7bfb0c521f3da83b6ff1cd5183f38c3bfab26bf2c9a0896013fbadbfe01

See more details on using hashes here.

File details

Details for the file ezautoml-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: ezautoml-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ezautoml-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 77988e5d208748c79caadb383b1b785a84c3a6e6c63451b9cbaa69034782a208
MD5 548328c576b5b8bf49b1701dc3cce160
BLAKE2b-256 764ae66381260caa5b9cca0ae5fd4f8fbd224321a955e1600de430962ef07ab7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page