Python package that analyze, visualize and suggest any changes to the machine learning pipeline

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- Microsoft :: Windows :: Windows 10
Programming Language
- Python :: 3.8

Project description

ml-pipeline-analyzer

contributors codesize pullrequests closedpullrequests

Machine Learning Pipeline Analyzer (MLPA)

Machine Learning Pipeline Analyzer (MLPA) is a python package that at its core analyzes, suggests and visualizes machine learning pipelines.

One of the primary goals of this package is to provide the user with a self-intuitive visual diagram of the pipeline model that explains the various components of the model and its respective attributes while also suggesting the changes and best pipeline model for the user needs.

Motivation

As a machine learning engineer or a Data Science engineer, we often create ML pipelines that perform multiple tasks like:
Data extraction -> Data Cleaning -> Data Manipulation -> Feature Selection/Reduction -> Model train and predict -> Cross Validation -> Model load/save

However, as the various components of a pipeline increase, creating a manual flowchart is not feasible but rather hard to understand/track. And although, there are certain already existing python packages leveraging DAG to visualize these ML pipelines, yet they can be hard to explore and understand. Therefore, our goal was to create a package that automates the daunting process of visualizing ML pipelines while also providing a capability to suggest the changes or best pipeline modes for the user inputted dataframes.

Acknowledgements

MLPA is an easier and simpler wrapper using the capabilities from the following existing Python libraries:

Installing the package

pip install mlpipeline_analyzer

Dependencies

Install the dependencies from the requirements.txt file using

python -m pip install -r requirements.txt

Code Examples:

Code example_1: Here in this part, the user uploads a model .pkl file which is then passed as an input to the PipelineDiagram class. The two ML pipeline diagrams are created using .show and .show_params:

evalml_pipeline = joblib.load('models/automl_pipeline.pkl')
a = PipelineDiagram(evalml_pipeline)
a.show(title='Evalml ML Pipeline Diagram')
a.show_params(title='Evalml Machine Learning Parameters Pipeline')

Code example_2: Here in this part, the suggest function generates the output for the varoius components of the model depending upon what the user specifies:

b = PipelineSuggest()
b.fit(data = df, response = 'survived', predictor_list = ['pclass','age','gender'], problem_type='binary', objective='auto', test_size=0.2)
b.suggest(suggest_type='fe')
b.suggest(suggest_type='model')
b.suggest(suggest_type='all')

Screenshots

Examples of outputs generated by the functions

Screenshot of a ML pipeline summary diagram:

Image1 Alt text

Screenshot of a ML pipeline hyperparameter diagram:

Image2 Alt text

Build Status

Although currently MLPA already supports feature engineering to some extent, however, with expandability as one of the project goals, we plan to add more and specific capabilities catered to the following areas:

Feature Engineering
Feature Extraction/Selection
Feature Reduction

For project extension, one possible functionality could be the capability for the user to specify the engine that they want to use for their model (example: TPOT, EvalML) and run the MLPA package on top of that engine.

Code Style

Languages used: Python Coding Style: - PEP 8 - Docstrings

Following sofware design principles have been considered while packaging MLPA:

Modular design
- 'Somewhat General Purpose' module
- Deep Modules
- Separation of Concerns
Reusability/Extensibility
Intuitable
Version Control using Github
Exception Handling
Support for automated CI/CD using Travis
Unit testing and coverage for quality assurance

Authors

GitHub Contributors Image

Contribute

This project is an open-source project- open to the Python user community for contribution.

Change Log

0.0.1 (2022-30-01)

Initial release

0.0.1 (2022-17-03)

Added show, show_params and suggest functionalities and updated the description

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- Microsoft :: Windows :: Windows 10
Programming Language
- Python :: 3.8

Release history Release notifications | RSS feed

This version

0.0.2

Mar 17, 2022

0.0.1

Mar 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlpipeline_analyzer-0.0.2.tar.gz (2.9 MB view details)

Uploaded Mar 17, 2022 Source

File details

Details for the file mlpipeline_analyzer-0.0.2.tar.gz.

File metadata

Download URL: mlpipeline_analyzer-0.0.2.tar.gz
Upload date: Mar 17, 2022
Size: 2.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/3.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for mlpipeline_analyzer-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`ff402860380d35bc1a3d95b7ac4067b4c81eb07e27320d3499a0a579f5ba0776`
MD5	`13c15fdfd16138ccb3bd8d4bef593fb4`
BLAKE2b-256	`d875697d615c8af5984981a35798825001093ea16c025176d87e77d15f638f79`