ELI5 is a Python package which helps to debug machine learning
classifiers and explain their predictions.
It provides support for the following machine learning frameworks and packages:
- scikit-learn. Currently ELI5 allows to explain weights and predictions
of scikit-learn linear classifiers and regressors, print decision trees
as text or as SVG, show feature importances and explain predictions
of decision trees and tree-based ensembles. ELI5 understands text
processing utilities from scikit-learn and can highlight text data
accordingly. It also allows to debug scikit-learn pipelines which contain
HashingVectorizer, by undoing hashing.
- xgboost - show feature importances and explain predictions of XGBClassifier
- lightning - explain weights and predictions of lightning classifiers and
- sklearn-crfsuite. ELI5 allows to check weights of sklearn_crfsuite.CRF
ELI5 also provides TextExplainer which allows to explain predictions
of any text classifier using LIME algorithm (Ribeiro et al., 2016).
There are utilities for using LIME with non-text data and arbitrary black-box
classifiers as well, but this feature is currently experimental.
Explanation and formatting are separated; you can get text-based explanation
to display in console, HTML version embeddable in an IPython notebook
or web dashboards, or JSON version which allows to implement custom
rendering and formatting on a client.
License is MIT.
Check docs for more.
- bug fix: eli5 should remain importable if xgboost is available, but
not installed correctly.
- feature contribution calculation fixed
- eli5.explain_prediction: new ‘top_targets’ argument allows
to display only predictions with highest or lowest scores;
- eli5.explain_weights allows to customize the way feature importances
are computed for XGBClassifier and XGBRegressor using importance_type
argument (see docs for the eli5 XGBoost support <library-xgboost>);
- eli5.explain_weights uses gain for XGBClassifier and XGBRegressor
feature importances by default; this method is a better indication of
what’s going, and it makes results more compatible with feature importances
displayed for scikit-learn gradient boosting methods.
- packaging fix: scikit-learn is added to install_requires in setup.py.
- eli5.explain_prediction works for XGBClassifier, XGBRegressor
from XGBoost and for ExtraTreesClassifier, ExtraTreesRegressor,
RandomForestClassifier, RandomForestRegressor, DecisionTreeClassifier
and DecisionTreeRegressor from scikit-learn.
Explanation method is based on
- eli5.explain_weights now supports tree-based regressors from
scikit-learn: DecisionTreeRegressor, AdaBoostRegressor,
GradientBoostingRegressor, RandomForestRegressor and ExtraTreesRegressor.
- eli5.explain_weights works for XGBRegressor;
- new TextExplainer <lime-tutorial> class allows to explain predictions
of black-box text classification pipelines using LIME algorithm;
many improvements in eli5.lime <eli5-lime>.
- better sklearn.pipeline.FeatureUnion support in
- rendering performance is improved;
- a number of remaining feature importances is shown when the feature
importance table is truncated;
- styling of feature importances tables is fixed;
- eli5.explain_weights and eli5.explain_prediction support
more linear estimators from scikit-learn: HuberRegressor, LarsCV, LassoCV,
LassoLars, LassoLarsCV, LassoLarsIC, OrthogonalMatchingPursuit,
RidgeClassifier, RidgeClassifierCV, TheilSenRegressor.
- text-based formatting of decision trees is changed: for binary
classification trees only a probability of “true” class is printed,
not both probabilities as it was before.
- eli5.explain_weights supports feature_filter in addition
to feature_re for filtering features, and eli5.explain_prediction
now also supports both of these arguments;
- ‘Weight’ column is renamed to ‘Contribution’ in the output of
- new show_feature_values=True formatter argument allows to display
input feature values;
- fixed an issue with analyzer=’char_wb’ highlighting at the start of the
- XGBClassifier support (from XGBoost
- eli5.explain_weights support for sklearn OneVsRestClassifier;
- std deviation of feature importances is no longer printed as zero
if it is not available.
- packaging fixes: require attrs > 16.0.0, fixed README rendering
- HTML output;
- IPython integration;
- JSON output;
- visualization of scikit-learn text vectorizers;
- lightning support;
- eli5.show_weights and eli5.show_prediction functions;
- eli5.explain_weights and eli5.explain_prediction
- eli5.lime <eli5-lime> improvements: samplers for non-text data,
bug fixes, docs;
- HashingVectorizer is supported for regression tasks;
- performance improvements - feature names are lazy;
- sklearn ElasticNetCV and RidgeCV support;
- it is now possible to customize formatting output - show/hide sections,
- sklearn OneVsRestClassifier support;
- sklearn DecisionTreeClassifier visualization (text-based or svg-based);
- dropped support for scikit-learn < 0.18;
- basic mypy type annotations;
- feature_re argument allows to show only a subset of features;
- target_names argument allows to change display names of targets/classes;
- targets argument allows to show a subset of targets/classes and
change their display order;
- documentation, more examples.
- Candidate features in eli5.sklearn.InvertableHashingVectorizer
are ordered by their frequency, first candidate is always positive.
- HashingVectorizer support in explain_prediction;
- add an option to pass coefficient scaling array; it is useful
if you want to compare coefficients for features which scale or sign
is different in the input;
- bug fix: classifier weights are no longer changed by eli5 functions.
- eli5.sklearn.InvertableHashingVectorizer and
eli5.sklearn.FeatureUnhasher allow to recover feature names for
pipelines which use HashingVectorizer or FeatureHasher;
- added support for scikit-learn linear regression models (ElasticNet,
Lars, Lasso, LinearRegression, LinearSVR, Ridge, SGDRegressor);
- doc and vec arguments are swapped in explain_prediction function;
vec can now be omitted if an example is already vectorized;
- fixed issue with dense feature vectors;
- all class_names arguments are renamed to target_names;
- feature name guessing is fixed for scikit-learn ensemble estimators;
- testing improvements.
- support any black-box classifier using LIME (http://arxiv.org/abs/1602.04938)
algorithm; text data support is built-in;
- “vectorized” argument for sklearn.explain_prediction; it allows to pass
example which is already vectorized;
- allow to pass feature_names explicitly;
- support classifiers without get_feature_names method using auto-generated
- ‘top’ argument of explain_prediction
can be a tuple (num_positive, num_negative);
- classifier name is no longer printed by default;
- added eli5.sklearn.explain_prediction to explain individual examples;
- fixed numpy warning.
TODO: Brief introduction on what you do with files - including link to relevant help section.