Skip to main content

Python package that analyze, visualize and suggest any changes to the machine learning pipeline

Project description

ml-pipeline-analyzer

Build Status Coverage Status MIT License contributors codesize pullrequests closedpullrequests

Machine Learning Pipeline Analyzer (MLPA)

Machine Learning Pipeline Analyzer (MLPA) is a python package that at its core analyzes, suggests and visualizes machine learning pipelines.

One of the primary goals of this package is to provide the user with a self-intuitive visual diagram of the pipeline model that explains the various components of the model and its respective attributes while also suggesting the changes and best pipeline model for the user needs.

Motivation

As a machine learning engineer or a Data Science engineer, we often create ML pipelines that perform multiple tasks like:
Data extraction -> Data Cleaning -> Data Manipulation -> Feature Selection/Reduction -> Model train and predict -> Cross Validation -> Model load/save

However, as the various components of a pipeline increase, creating a manual flowchart is not feasible but rather hard to understand/track. And although, there are certain already existing python packages leveraging DAG to visualize these ML pipelines, yet they can be hard to explore and understand. Therefore, our goal was to create a package that automates the daunting process of visualizing ML pipelines while also providing a capability to suggest the changes or best pipeline modes for the user inputted dataframes.

Acknowledgements

MLPA is an easier and simpler wrapper using the capabilities from the following existing Python libraries:

Installing the package

pip install mlpipeline_analyzer

Dependencies

Install the dependencies from the requirements.txt file using

python -m pip install -r requirements.txt

Code Examples:

Code example_1: Here in this part, the user uploads a model .pkl file which is then passed as an input to the PipelineDiagram class. The two ML pipeline diagrams are created using .show and .show_params:

evalml_pipeline = joblib.load('models/automl_pipeline.pkl')
a = PipelineDiagram(evalml_pipeline)
a.show(title='Evalml ML Pipeline Diagram')
a.show_params(title='Evalml Machine Learning Parameters Pipeline')

Code example_2: Here in this part, the suggest function generates the output for the varoius components of the model depending upon what the user specifies:

b = PipelineSuggest()
b.fit(data = df, response = 'survived', predictor_list = ['pclass','age','gender'], problem_type='binary', objective='auto', test_size=0.2)
b.suggest(suggest_type='fe')
b.suggest(suggest_type='model')
b.suggest(suggest_type='all')

Screenshots

Examples of outputs generated by the functions

  • Screenshot of a ML pipeline summary diagram:

Image1 Alt text

  • Screenshot of a ML pipeline hyperparameter diagram:

Image2 Alt text

Build Status

Although currently MLPA already supports feature engineering to some extent, however, with expandability as one of the project goals, we plan to add more and specific capabilities catered to the following areas:

  • Feature Engineering
  • Feature Extraction/Selection
  • Feature Reduction

For project extension, one possible functionality could be the capability for the user to specify the engine that they want to use for their model (example: TPOT, EvalML) and run the MLPA package on top of that engine.

Code Style

Languages used: Python Coding Style: - PEP 8 - Docstrings

Following sofware design principles have been considered while packaging MLPA:

  • Modular design
    • 'Somewhat General Purpose' module
    • Deep Modules
    • Separation of Concerns
  • Reusability/Extensibility
  • Intuitable
  • Version Control using Github
  • Exception Handling
  • Support for automated CI/CD using Travis
  • Unit testing and coverage for quality assurance

Authors

GitHub Contributors Image

Contribute

This project is an open-source project- open to the Python user community for contribution.

Change Log

0.0.1 (2022-30-01)

  • Initial release

0.0.1 (2022-17-03)

  • Added show, show_params and suggest functionalities and updated the description

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlpipeline_analyzer-0.0.2.tar.gz (2.9 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page