A framework to diagnose ML models
Project description
MLDiag
This python library helps you diagnose machine learning models before deployment.
Visit this introduction to understand about MLDiag.
Features
- Generate synthetic data with adversarial attacks to evaluate model robustness
- Make some interesting statistics on model behaviour
- Simple, easy-to-use and lightweight library. Diagnose data in 3 lines of code
- Plug and play to any neural network frameworks (e.g. PyTorch, TensorFlow) or standard machine learning framework (e.g. scikit-learn)
- Support textual, image, audio and structured data
- Can be added in a CI workflow
- Can be used in command line or python scripts
Quick Demo
Quick start
Installation
The library supports python 3.7+ in linux and window platform.
To install the library:
pip install mldiag
or install the latest version (include BETA features) from github directly
pip install git+https://github.com/AI-MEN/mldiag.git
Run a diagostic
Method 1:
This method uses command lines only. It requires a model running as a webservice. We provide for a demo a complete example:
- create a text classification model:
python examples/text_classification/tf_text_classification.py train --save_model_path=./mldiag
a tensorflow model model.h5
is created in the mldiag
directory
- Run a text classification web service:
python examples/text_classification/flask_text_classification_service.py --model_path ./mldiag/model.h5
a local webservice is running under http://localhost:8080/query
- create the test set to diagnose the model
python examples/text_classification/tf_text_classification.py save_test_set --out_path=./mldiag
a test set test.npy
is saved in mldiag
.
It contains a numpy array of text examples and their class labels
- run the diagnostic application calling the web service
python mldiag/cli.py diagnose --eval_set "./mldiag/test.npy"
--config_file "examples/text_classification/config_text_classification.yaml"
--service_url http://localhost:8080/query
--report_path "./mldiag"
--json_field "results"
where results
is the key used to jsonify data from the webservice (see the web service script).
![MLDiag](https://github.com/AI-MEN/MLDiag/tree/master/blog/capture.jpg =300x100)
Method 2
This method uses python scripts. it supports a number of machine learning models and data formats through wrappers. Ready to use wrappers can be found in mldiag/wrappers.py In the following, a complete example is proposed as demo.
- create a text classification model:
python examples/text_classification/tf_text_classification.py train --save_model_path=./mldiag
a tensorflow model model.h5
is created in the mldiag
directory
- call the python scrip (the diagnose config file is available in
examples/text_classification/config_text_classification.yaml
):
python examples/text_classification/tf_text_classification_diag.py run --model_path=./mldiag/model.h5 --repor_path=./mldiag
Diagnostics
Diagnostic | Target | Action | Description |
---|---|---|---|
Textual | Character | OCRError | Simulate ocr error |
Recent Changes
See changelog for more details.
Extension Reading
- Data Augmentation library for Text
- Does your NLP model able to prevent adversarial attack?
- How does Data Noising Help to Improve your NLP Model?
- Data Augmentation library for Speech Recognition
- Data Augmentation library for Audio
- Unsupervied Data Augmentation
- A Visual Survey of Data Augmentation in NLP
Reference
This library uses:
- data (e.g. capturing from internet),
- research (e.g. following augmenter idea),
- model (e.g. using pre-trained model)
TODO: update sources
See data source for more details.
Citing
@misc{shabou2020mldiag,
title={Machine learning diagnosis},
author={Aymen SHABOU},
howpublished={https://github.com/AI-MEN/MLDiag},
year={2020}
}
Contributions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.