Python package that analyze, visualize and suggest any changes to the machine learning pipeline
Project description
ml-pipeline-analyzer
Machine Learning Pipeline Analyzer (MLPA)
Machine Learning Pipeline Analyzer (MLPA) is a python package that at its core analyzes, suggests and visualizes machine learning pipelines.
One of the primary goals of this package is to provide the user with a self-intuitive visual diagram of the pipeline model that explains the various components of the model and its respective attributes while also suggesting the changes and best pipeline model for the user needs.
Motivation
As a machine learning engineer or a Data Science engineer, we often create ML pipelines that perform multiple tasks like:
Data extraction -> Data Cleaning -> Data Manipulation -> Feature Selection/Reduction -> Model train and predict -> Cross Validation -> Model load/save
However, as the various components of a pipeline increase, creating a manual flowchart is not feasible but rather hard to understand/track. And although, there are certain already existing python packages leveraging DAG to visualize these ML pipelines, yet they can be hard to explore and understand. Therefore, our goal was to create a package that automates the daunting process of visualizing ML pipelines while also providing a capability to suggest the changes or best pipeline modes for the user inputted dataframes.
Acknowledgements
MLPA is an easier and simpler wrapper using the capabilities from the following existing Python libraries:
Installing the package
pip install mlpipeline_analyzer
Dependencies
Install the dependencies from the requirements.txt file using
python -m pip install -r requirements.txt
Code Examples:
Code example_1: Here in this part, the user uploads a model .pkl file which is then passed as an input to the PipelineDiagram class. The two ML pipeline diagrams are created using .show and .show_params:
evalml_pipeline = joblib.load('models/automl_pipeline.pkl')
a = PipelineDiagram(evalml_pipeline)
a.show(title='Evalml ML Pipeline Diagram')
a.show_params(title='Evalml Machine Learning Parameters Pipeline')
Code example_2: Here in this part, the suggest function generates the output for the varoius components of the model depending upon what the user specifies:
b = PipelineSuggest()
b.fit(data = df, response = 'survived', predictor_list = ['pclass','age','gender'], problem_type='binary', objective='auto', test_size=0.2)
b.suggest(suggest_type='fe')
b.suggest(suggest_type='model')
b.suggest(suggest_type='all')
Screenshots
Examples of outputs generated by the functions
- Screenshot of a ML pipeline summary diagram:
- Screenshot of a ML pipeline hyperparameter diagram:
Build Status
Although currently MLPA already supports feature engineering to some extent, however, with expandability as one of the project goals, we plan to add more and specific capabilities catered to the following areas:
- Feature Engineering
- Feature Extraction/Selection
- Feature Reduction
For project extension, one possible functionality could be the capability for the user to specify the engine that they want to use for their model (example: TPOT, EvalML) and run the MLPA package on top of that engine.
Code Style
Languages used: Python Coding Style: - PEP 8 - Docstrings
Following sofware design principles have been considered while packaging MLPA:
- Modular design
- 'Somewhat General Purpose' module
- Deep Modules
- Separation of Concerns
- Reusability/Extensibility
- Intuitable
- Version Control using Github
- Exception Handling
- Support for automated CI/CD using Travis
- Unit testing and coverage for quality assurance
Authors
Contribute
This project is an open-source project- open to the Python user community for contribution.
Change Log
0.0.1 (2022-30-01)
- Initial release
0.0.1 (2022-17-03)
- Added show, show_params and suggest functionalities and updated the description
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file mlpipeline_analyzer-0.0.2.tar.gz
.
File metadata
- Download URL: mlpipeline_analyzer-0.0.2.tar.gz
- Upload date:
- Size: 2.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/3.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff402860380d35bc1a3d95b7ac4067b4c81eb07e27320d3499a0a579f5ba0776 |
|
MD5 | 13c15fdfd16138ccb3bd8d4bef593fb4 |
|
BLAKE2b-256 | d875697d615c8af5984981a35798825001093ea16c025176d87e77d15f638f79 |