Skip to main content

Robustify:

Project description

codecov License: MIT CodeQL PyPI - Python Version PyPI Build

Welcome to Robustify, a GitHub repository focused on evaluating the effects of adding structurally conserving noise to data. The goal is to provide a comprehensive set of tools for researchers and practitioners interested in exploring the impact of noise on the score and robustness of their machine learning models. The repository includes a variety of noise generation and augmentation techniques, as well as methods for evaluating the effects of noise on model performance, robustness metrics and visualizations.

Install

Robustify can be installed from either PyPI or conda-forge:

pip install RobustifyToolkit
or
conda install -c conda-forge RobustifyToolkit 

Usages

Simulating uncertainty:

Adding noise to data is a widely used technique in machine learning with various benefits. One important use case is simulating uncertainty in data. By introducing random noise into the training data, machine learning models can learn to be more robust and effective in real-world scenarios where the input data may be noisy or the values uncertain.

Robustness:

Another key benefit of adding noise to data is improved robustness. Machine learning models trained on noisy data can learn to be more resilient to adversarial attacks or naturally occurring perturbations. Moreover, adding noise to data can help mitigate the effect of outliers in the training data and can improve the model's overall performance.

Generalization:

Adding noise to data can also help improve the generalization performance of machine learning models. Overfitting is a common problem in machine learning, where the model becomes too specialized to the training data and performs poorly on new, unseen data. By introducing noise to the training data, the model is forced to learn more robust and generalizable features, resulting in better performance on test data.

Data augmentation:

Data augmentation is a technique used in machine learning to increase the size and diversity of a dataset by creating new examples from existing ones. Adding noise to data can be an effective form of data augmentation. By adding noise to the data, the model is exposed to more variations of the input data, which can improve its ability to recognize patterns and make accurate predictions on new, unseen data. Additionally, data augmentation can reduce the risk of overfitting and improve the model's generalization performance.

Compatibility

Robustify is compatible with most machine learning models trained on single-output tabular data from Scikit-learn, PyTorch, TensorFlow/Keras, and FastAI.

Feature importance measures

Feature importance is an important concept in machine learning that allows us to understand which features are most influential in making predictions. Adding deliberate noise to a particular feature can affect the feature's importance to a particular model's predictive abilities. Several external libraries are available, and different methods can be used to determine feature importance.

Scikit-learn feature importance, coefficients, and permutation importance:

Scikit-learn is a popular Python library for machine learning that provides several methods for calculating feature importance. One of the simplest methods is to use the feature_importances_ attribute of decision tree-based models, such as Random Forest and Gradient Boosting. This attribute calculates the importance of each feature by measuring the reduction in impurity that results from splitting on that feature. Additionally, linear models can use the absolute value of the coefficients as a measure of feature importance. Scikit-learn provides the coef_ attribute for this purpose. Another approach is to use permutation-based feature importance, which involves randomly permuting the values of each feature and measuring the resulting decrease in model performance. Scikit-learn provides the permutation importance function to implement this method.

Eli5

ELI5 (Explain Like I'm Five) is a Python library that provides a simple and intuitive way to explain machine learning models. In this application, permutation importance by ELI5 is used as an alternative to scikit-learn's permutation importance, that is not compatible with all types of models.

Lime

Lime (Local Interpretable Model-Agnostic Explanations) is a library that helps explain the predictions of machine learning models. It works by creating locally faithful linear models that approximate the predictions of a black-box model, and then providing explanations based on the coefficients of those models. Lime is useful for understanding how specific features contribute to a model's predictions, and for identifying potential biases or limitations in the model.

Shap

SHAP (SHapley Additive exPlanations) is another Python library for explaining machine learning models. It provides a unified framework for interpreting a wide range of model types. SHAP uses game-theoretic concepts to compute feature importance values for each input feature and provides global and local explanations of model behaviour. It is particularly useful for understanding how different features interact with each other to affect model predictions.

Examples

Keras

Pytorch

Scikit-learn

Documentation

See the Wiki for documentation of the available methods.

License is MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RobustifyToolkit-0.1.1.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

RobustifyToolkit-0.1.1-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file RobustifyToolkit-0.1.1.tar.gz.

File metadata

  • Download URL: RobustifyToolkit-0.1.1.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for RobustifyToolkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b6a9680639c12402d1dfe44be5cafe082e770bc6d265abaa5880689a9dbb43b3
MD5 88fd7ec28d8e6a5ac5ba76426fcf250e
BLAKE2b-256 10df728e0308eaab27e20c42a5e1bb0047212ed6096c59aecd39417954163f69

See more details on using hashes here.

File details

Details for the file RobustifyToolkit-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for RobustifyToolkit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41ab0794c4bd17d0cace7e03e650f86c30d37d4fde6c29494e110b851e81fefd
MD5 076db90692044fa4b2e4984ead95a597
BLAKE2b-256 b9493532e0a8a7753c12065a7aa322719ca3c88eb941b909e8a0e0238e189f48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page