Skip to main content

A Democratized lightweight and transparent AutoML framework

Project description

License Stars Forks Last Commit Commit Activity Docs

eZAutoML

Overview

eZAutoML is a framework designed to make Automated Machine Learning (AutoML) accessible to everyone. It provides an incredible easy to use interface based on Scikit-Learn API to build modelling pipelines with minimal effort.

The framework is built around a few core concepts:

  1. Optimizers: Black-box optimization methods for hyperparameters.
  2. Easy Tabular Pipelines: Simple domain-specific language to describe pipelines for preprocessing and model training.
  3. Scheduling: Work in progress; this feature enables horizontal scalability from a single computer to datacenters by using airflow executors.

Installation

Package Distribution

The latest version of eZAutoML can be installed via PyPI or from source.

pip install ezautoml
ezautoml --help

Install from source

To install from source, you can clone this repo and install with pip:

pip install -e .

Usage

Command Line Interface

Usage:

ezautoml --dataset <path_to_data> --target <target_name> --task <classification|regression> --models <model1,model2,...> --cv <folds> --output <path_to_output>

Options:

  • dataset: Path to the dataset file (CSV, parquet...)
  • target: The target column name for prediction
  • task: Task type: classification or regression
  • search: Black-box optimization algorithm to perform
  • models: Comma-separated list of models to use (e.g., lr,rf,xgb). Use initials!
  • cv: Number of cross-validation folds (if needed)
  • output: Directory to save the output models/results
  • trials: Maximum number of trials inside an optimiation algorithm
  • preprocess: Whether to perform minimal preprocessing (Scaling, Encoding...) or not
  • verbose: Increase logging verbosity
  • version: Show the current version

For more detailed help, use:

ezautoml --help

There are future features that are still a work-in-progress and will be enabled in the future such as scheduling, metalearning, pipelines...

Python Script

You can also use eZAutoML within Python scripts (though this feature is still being developed). This will allow you to work through Python code or via custom pipelines in the future.

???

WIP

WIP TODO List for eZAutoML

1. Core System Setup

  • Implement Dataset Loading (datasets.py)

    • Build a utility to load datasets from various formats (CSV, Parquet, etc.).
    • Implement functionality to split datasets into train and test sets.
  • Preprocessing (preprocess.py)

    • Implement basic preprocessing such as:
      • Feature scaling (StandardScaler)
      • Label encoding for classification tasks
      • Handling missing values (if necessary)
    • Optional: Extend to more advanced preprocessing in the future.

2. Model Implementation

  • Model Definitions (models.py)

    • Implement a list of models:
      • SVM, RandomForest, XGBoost, etc.
      • Ensure models can be easily swapped based on the user's request in CLI (--models flag).
  • Search Strategy (search.py)

    • Implement the abstract optimizer class, and separate search strategies such as:
      • Random Search: Use for hyperparameter tuning.
      • Grid Search: For exhaustive search of hyperparameters.
    • Provide flexibility to add new strategies later.

3. Model Evaluation

  • Evaluator (evaluation.py)

    • Implement cross-validation to assess model performance.
    • Support various metrics (accuracy, F1 score, etc.) based on the task (classification/regression).
  • Leaderboard (reporting.py)

    • Track and store model performance (accuracy, metrics).
    • Build a leaderboard that ranks models based on their cross-validation score.

4. Optimization System

  • Abstract Optimizer (search.py)

    • Implement a base class for optimizers, handling setup and execution of hyperparameter search.
    • Design the optimizer to integrate with different search strategies (Random Search, Grid Search).
  • Random Search Optimizer

    • Implement random hyperparameter search strategy.
    • Randomly sample hyperparameters from predefined search spaces.
    • Use the evaluator to assess performance during each trial.

5. History Tracking

  • Build History Logging System (history.py)
    • Implement a system to store trial results (model parameters, validation scores, etc.).
    • Provide an easy way to retrieve and analyze previous experiment results.

6. Reporting and Output

  • Reporting (reporting.py)
    • Create functionality to log experiment results.
    • Optionally generate visualization such as bar plots for leaderboard.
    • Save reports and models to the specified output directory.

7. CLI Interface (eZAutoML/cli.py)

  • Refine CLI (cli.py)

    • Add user-friendly descriptions, argument validation, and proper help messages.
    • Implement user input handling for tasks, models, and search strategies.
    • Provide version information and CLI help as requested by users.
  • CLI Workflow:

    • Allow users to define dataset, task, models, and optimization settings directly from the command line.
    • Provide options for verbosity, output directory, and saving models.

8. Configuration System

  • Config Management (config.py)
    • Define default search spaces for hyperparameters.
    • Allow easy configuration of model hyperparameters and search spaces.
    • Ensure flexibility for future extension.

9. Testing and Validation

  • Unit Testing

    • Write basic unit tests to validate the core functionalities:
      • Dataset loading
      • Preprocessing steps
      • Model training and evaluation
      • Optimizer logic
      • Leaderboard reporting
  • Integration Testing

    • Ensure the complete pipeline (from dataset loading to final reporting) works seamlessly together.

10. Finalization and Documentation

  • Documentation
    • Update the README.md file to include details on installation, usage, and examples.
    • Add docstrings for all functions and classes to ensure code readability.
    • Document search strategies, hyperparameter configurations, and any custom optimizers implemented.

11. Future Enhancements

  • Optional Preprocessing Steps
    • More advanced preprocessing (feature engineering, imputation, etc.).
  • Model Extensions
    • Add more models like Neural Networks, LightGBM, etc.
  • Hyperparameter Optimization with BayesOpt or Optuna
    • Extend Random Search with more advanced optimization methods.

12. Release Plan

  • Release Alpha Version

    • Ensure basic functionality works for both classification and regression tasks.
    • Allow users to run experiments via the CLI.
  • Prepare for Beta Testing

    • Test the MVP with real datasets and gather feedback.
    • Refine based on issues and feedback.

Contributing

We welcome contributions to eZAutoML! If you'd like to contribute, please fork the repository and submit a pull request with your changes. For detailed information on how to contribute, please refer to our contributing guide.

License

eZAutoML is licensed under the BSD 3-Clause License. See the LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezautoml-0.1.2.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ezautoml-0.1.2-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file ezautoml-0.1.2.tar.gz.

File metadata

  • Download URL: ezautoml-0.1.2.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ezautoml-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8abd92ea75b2e3986df3aa768ec2feefceb4cc000fc235b480cb0c078d572771
MD5 1c9641197cb055480f91cf3b6f08ee7e
BLAKE2b-256 2bd4dcccf0735f9fc9eea0d63d31ebb9b86da68d27ae16317b752454748d4e3e

See more details on using hashes here.

File details

Details for the file ezautoml-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ezautoml-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ezautoml-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cdbbffada291288c034165de15fcebabb4ee79f25e172c43f7589414e2fa6247
MD5 335b719d89fe48f1a4724c1c04981438
BLAKE2b-256 b5c116d8fc880f377cfb19cbb0b72b61a6dd24f0adf089fed5f1346526681f25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page