A Democratized lightweight and transparent AutoML framework
Project description
eZAutoML
Overview
eZAutoML is a framework designed to make Automated Machine Learning (AutoML) accessible to everyone. It provides an incredible easy to use interface based on Scikit-Learn API to build modelling pipelines with minimal effort.
The framework is built around a few core concepts:
- Optimizers: Black-box optimization methods for hyperparameters.
- Easy Tabular Pipelines: Simple domain-specific language to describe pipelines for preprocessing and model training.
- Scheduling: Work in progress; this feature enables horizontal scalability from a single computer to datacenters by using airflow executors.
Installation
Package Distribution
The latest version of eZAutoML can be installed via PyPI or from source.
pip install ezautoml
ezautoml --help
Install from source
To install from source, you can clone this repo and install with pip:
pip install -e .
Usage
Command Line Interface
Usage:
ezautoml --dataset <path_to_data> --target <target_name> --task <classification|regression> --models <model1,model2,...> --cv <folds> --output <path_to_output>
Options:
- dataset: Path to the dataset file (CSV, parquet...)
- target: The target column name for prediction
- task: Task type: classification or regression
- search: Black-box optimization algorithm to perform
- models: Comma-separated list of models to use (e.g., lr,rf,xgb). Use initials!
- cv: Number of cross-validation folds (if needed)
- output: Directory to save the output models/results
- trials: Maximum number of trials inside an optimiation algorithm
- preprocess: Whether to perform minimal preprocessing (Scaling, Encoding...) or not
- verbose: Increase logging verbosity
- version: Show the current version
For more detailed help, use:
ezautoml --help
There are future features that are still a work-in-progress and will be enabled in the future such as scheduling, metalearning, pipelines...
Python Script
You can also use eZAutoML within Python scripts (though this feature is still being developed). This will allow you to work through Python code or via custom pipelines in the future.
???
WIP
WIP TODO List for eZAutoML
1. Core System Setup
-
Implement Dataset Loading (
datasets.py)- Build a utility to load datasets from various formats (CSV, Parquet, etc.).
- Implement functionality to split datasets into train and test sets.
-
Preprocessing (
preprocess.py)- Implement basic preprocessing such as:
- Feature scaling (StandardScaler)
- Label encoding for classification tasks
- Handling missing values (if necessary)
- Optional: Extend to more advanced preprocessing in the future.
- Implement basic preprocessing such as:
2. Model Implementation
-
Model Definitions (
models.py)- Implement a list of models:
- SVM, RandomForest, XGBoost, etc.
- Ensure models can be easily swapped based on the user's request in CLI (
--modelsflag).
- Implement a list of models:
-
Search Strategy (
search.py)- Implement the abstract optimizer class, and separate search strategies such as:
- Random Search: Use for hyperparameter tuning.
- Grid Search: For exhaustive search of hyperparameters.
- Provide flexibility to add new strategies later.
- Implement the abstract optimizer class, and separate search strategies such as:
3. Model Evaluation
-
Evaluator (
evaluation.py)- Implement cross-validation to assess model performance.
- Support various metrics (accuracy, F1 score, etc.) based on the task (classification/regression).
-
Leaderboard (
reporting.py)- Track and store model performance (accuracy, metrics).
- Build a leaderboard that ranks models based on their cross-validation score.
4. Optimization System
-
Abstract Optimizer (
search.py)- Implement a base class for optimizers, handling setup and execution of hyperparameter search.
- Design the optimizer to integrate with different search strategies (Random Search, Grid Search).
-
Random Search Optimizer
- Implement random hyperparameter search strategy.
- Randomly sample hyperparameters from predefined search spaces.
- Use the evaluator to assess performance during each trial.
5. History Tracking
- Build History Logging System (
history.py)- Implement a system to store trial results (model parameters, validation scores, etc.).
- Provide an easy way to retrieve and analyze previous experiment results.
6. Reporting and Output
- Reporting (
reporting.py)- Create functionality to log experiment results.
- Optionally generate visualization such as bar plots for leaderboard.
- Save reports and models to the specified output directory.
7. CLI Interface (eZAutoML/cli.py)
-
Refine CLI (
cli.py)- Add user-friendly descriptions, argument validation, and proper help messages.
- Implement user input handling for tasks, models, and search strategies.
- Provide version information and CLI help as requested by users.
-
CLI Workflow:
- Allow users to define dataset, task, models, and optimization settings directly from the command line.
- Provide options for verbosity, output directory, and saving models.
8. Configuration System
- Config Management (
config.py)- Define default search spaces for hyperparameters.
- Allow easy configuration of model hyperparameters and search spaces.
- Ensure flexibility for future extension.
9. Testing and Validation
-
Unit Testing
- Write basic unit tests to validate the core functionalities:
- Dataset loading
- Preprocessing steps
- Model training and evaluation
- Optimizer logic
- Leaderboard reporting
- Write basic unit tests to validate the core functionalities:
-
Integration Testing
- Ensure the complete pipeline (from dataset loading to final reporting) works seamlessly together.
10. Finalization and Documentation
- Documentation
- Update the
README.mdfile to include details on installation, usage, and examples. - Add docstrings for all functions and classes to ensure code readability.
- Document search strategies, hyperparameter configurations, and any custom optimizers implemented.
- Update the
11. Future Enhancements
- Optional Preprocessing Steps
- More advanced preprocessing (feature engineering, imputation, etc.).
- Model Extensions
- Add more models like Neural Networks, LightGBM, etc.
- Hyperparameter Optimization with BayesOpt or Optuna
- Extend Random Search with more advanced optimization methods.
12. Release Plan
-
Release Alpha Version
- Ensure basic functionality works for both classification and regression tasks.
- Allow users to run experiments via the CLI.
-
Prepare for Beta Testing
- Test the MVP with real datasets and gather feedback.
- Refine based on issues and feedback.
Contributing
We welcome contributions to eZAutoML! If you'd like to contribute, please fork the repository and submit a pull request with your changes. For detailed information on how to contribute, please refer to our contributing guide.
License
eZAutoML is licensed under the BSD 3-Clause License. See the LICENSE file for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ezautoml-0.1.2.tar.gz.
File metadata
- Download URL: ezautoml-0.1.2.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8abd92ea75b2e3986df3aa768ec2feefceb4cc000fc235b480cb0c078d572771
|
|
| MD5 |
1c9641197cb055480f91cf3b6f08ee7e
|
|
| BLAKE2b-256 |
2bd4dcccf0735f9fc9eea0d63d31ebb9b86da68d27ae16317b752454748d4e3e
|
File details
Details for the file ezautoml-0.1.2-py3-none-any.whl.
File metadata
- Download URL: ezautoml-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdbbffada291288c034165de15fcebabb4ee79f25e172c43f7589414e2fa6247
|
|
| MD5 |
335b719d89fe48f1a4724c1c04981438
|
|
| BLAKE2b-256 |
b5c116d8fc880f377cfb19cbb0b72b61a6dd24f0adf089fed5f1346526681f25
|