Skip to main content

OptimalFlow is an Omni-ensemble Automated Machine Learning toolkit to help data scientists building optimal models in easy way, and automate Machine Learning workflow with simple code.

Project description

OptimalFlow

PyPI Latest Release PyPI - Downloads Github Issues License Last Commit Python Version

Author: Tony Dong

OptimalFlow is an Omni-ensemble Automated Machine Learning toolkit, which is based on Pipeline Cluster Traversal Experiment approach, to help data scientists building optimal models in easy way, and automate Machine Learning workflow with simple codes. In most recent version, I created a Web App, based on flask framework, as OptimalFlow's GUI. Users could build Automated Machine Learning workflow all clicks, without any coding at all!

Comparing other popular "AutoML or Automated Machine Learning" APIs, OptimalFlow is designed as an omni-ensembled ML workflow optimizer with higher-level API targeting to avoid manual repetitive train-along-evaluate experiments in general pipeline building.

To achieve that, OptimalFlow applies Pipeline Cluster Traversal Experiments algorithm to assemble all cross-matching pipelines covering major tasks of Machine Learning workflow, and apply traversal-experiment to search the optimal baseline model.

Besides, by modularizing all key pipeline components in reuseable packages, it allows all components to be custom tunable along with high scalability.

The core concept in OptimalFlow is Pipeline Cluster Traversal Experiments, which is a theory, first raised by Tony Dong during Genpact 2020 GVector Conference, to optimize and automate Machine Learning Workflow using ensemble pipelines algorithm.

Comparing other automated or classic machine learning workflow's repetitive experiments using single pipeline, Pipeline Cluster Traversal Experiments is more powerful, with larger coverage scope, to find the best model without manual intervention, and also more flexible with elasticity to cope with unseen data due to its ensemble designs in each component.

In summary, OptimalFlow shares a few useful properties for data scientists:

  • Easy & less coding - High-level APIs to implement Pipeline Cluster Traversal Experiments, and each ML component is highly automated and modularized;

  • Well ensembled - Each key component is ensemble of popular algorithms w/ optimal hyperparameters tuning included;

  • Omni-Coverage - Using Pipeline Cluster Traversal Experiments, to cross-experiment with combined permutated input datasets, feature selection, and model selection;

  • Scalable - Each module could add new algorithms easily due to its ensemble and reuseable coding design;

  • Adaptable - Pipeline Cluster Traversal Experiments makes it easier to adapt unseen datasets with the right pipeline;

  • Custom Modify Welcomed - Support custom settings to add/remove algorithms or modify hyperparameters for elastic requirements.

Documentation: https://Optimal-Flow.readthedocs.io/

Installation

pip install OptimalFlow

Core Modules:

  • autoPP for feature preprocessing
  • autoFS for classification/regression features selection
  • autoCV for classification/regression model selection and evaluation
  • autoPipe for Pipeline Cluster Traversal Experiments
  • autoViz for pipeline cluster visualization. Current available: Model retrieval diagram
  • autoFlow for logging & tracking.

Notebook Demo:

Binder

An End-to-End OptimalFlow Automated Machine Learning Tutorial with Real Projects

Updates on 9/16/2020


  • Created a Web App based on flask framework as OptimalFlow's GUI, to build PCTE Automated Machine Learning by simply clicks without any coding at all!
  • Web App included PCTE workflow bulder, LogsViewer, Visualization, Documentation sections.
  • Fix the filename issues in autoViz module, and remove auto_open function when generating new html format plots.

Updates on 8/31/2020

  • Modify autoPP's default_parameters: Remove "None" in "scaler", modify "sparsity" : [0.50], modify "cols" : [100]
  • Modify autoViz clf_table_report()'s coloring settings
  • Fix bugs in autoViz reg_table_report()'s gradient coloring function

Updates on 8/28/2020

  • Remove evaluate_model() function's round() bugs in coping with classification problem
  • Move out SVM based algorithm from fastClassifier & fastRegressor's default estimators settings
  • Move out SVM based algorithm from autoFS class's default selectors settings

Updates on 8/26/2020

  • Fix evaluate_model() function's bugs in coping with regression problem
  • Add reg_table_report() function to create dynamic table report for regression problem in autoViz

Updates on 8/24/2020

  • Fix evaluate_model() function's precision_score issue when running modelmulti-class classification problems
  • Add custom_selectors args for customized algorithm settings with autoFS's 2 classes(dynaFS_reg, dynaFS_clf)

Updates on 8/20/2020

  • Add Dynamic Table for Pipeline Cluster Model Evaluation Report in autoViz module
  • Add custom_estimators args for customized algorithm settings with autoCV's 4 classes(dynaClassifier,dynaRegressor,fastClassifier, and fastRegressor)

Updates on 8/14/2020

  • Add fastClassifier, and fastRegressor class which are both random parameter search based
  • Modify the display settings when using dynaClassifier in non in_pipeline mode

Updates on 8/10/2020

  • Stable 0.1.0 version release on Pypi

Updates on 8/7/2020

  • Add estimators: HuberRegressor, RidgeCV, LassoCV, SGDRegressor, and HistGradientBoostingRegressor
  • Modify parameters.json, and reset_parameters.json for the added estimators
  • Add autoViz for classification problem model retrieval diagram

License:

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimalflow-0.1.9.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

optimalflow-0.1.9-py3-none-any.whl (2.9 MB view details)

Uploaded Python 3

File details

Details for the file optimalflow-0.1.9.tar.gz.

File metadata

  • Download URL: optimalflow-0.1.9.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.4

File hashes

Hashes for optimalflow-0.1.9.tar.gz
Algorithm Hash digest
SHA256 c4793afb612f3de172484d2b409e78eb4063bcc6495ffde1fc9acb36a9f4214f
MD5 0cd12d5fcef7226fa974612c54c3fb05
BLAKE2b-256 055c7ab7f545c1cb18a899d61404f374b42e5cc4d9886f36978250bee5977b9a

See more details on using hashes here.

File details

Details for the file optimalflow-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: optimalflow-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 2.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.4

File hashes

Hashes for optimalflow-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 666da0ed66ff4709036c13e4bb89cca1796fca7b6bbba3487506d92effc379db
MD5 d3360b9e8c3ecb2bd75387a9c766af25
BLAKE2b-256 5a67cc33daeaea782d3fc06ce8184aecf3d12b5e4eea78916cc0dc2406945d04

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page