The library for GLM and Ensemble Tree model explanation
Project description
The "Transparency" Library
Scalable and Fast, local (single level) and global (population level) prediction explanation of:
- Ensemble trees (e.g., XGB, GBM, RF, and Decision tree)
- Generalized linear models GLM (support for various families, link powers, and variance powers, e.g., logistic regression)
implemented for models in:
- Python (Scikit-Learn)
- Pyspark (Scala and Pyspark)
Installation:
pip install transparency
additional step for Spark users:
- Add this jar to spark classpath : https://github.com/alvinhenrick/spark_model_explainer/releases/download/v.0.0.1/spark_model_explainer-assembly-0.0.1.jar (Maven repository release will soon be supported : https://github.com/clarifyhealth/spark_model_explainer)
Transformer Set
- Scikit-Learn Ensemble Tree Explainer Transformer
from transparency.python.explainer.ensemble_tree import EnsembleTreeExplainerTransformer
expl = EnsembleTreeExplainerTransformer(estimator)
X_test_df = expl.transform(X_test_df)
- estimator: the ensemble tree estimator that has been trained (e.g., random forest, gbm, or xgb)
- X_test: a Pandas dataframe with features as columns and samples as rows The resulting X_test_df will have 3 added columns: 'prediction', 'feature_contributions' and 'intercept_contribution':
- 'feature_contributions': column of nested arrays with feature contributions (1 array per row)
- 'intercept_contribution': column of the same scaler value representing the contribution of the intercept sum(contributions) + contrib_intercept for each row equals the prediction for that row.
- Scikit-Learn Generalized Linear Model (e.g., Logistic regression) Explainer Transformer
from transparency.python.explainer.glm import GLMExplainerTransformer
expl = GLMExplainerTransformer(estimator)
X_test_df = expl.transform(X_test_df, output_proba=False)
- estimator: the glm estimator that has been trained (e.g., logistic regression)
- X_test: a Pandas dataframe with features as columns and samples as rows The resulting X_test_df will have 3 added columns: 'prediction', 'feature_contributions' and 'intercept_contribution':
- 'feature_contributions': column of nested arrays with feature contributions (1 array per row)
- 'intercept_contribution': column of the same scaler value representing the contribution of the intercept sum(contributions) + contrib_intercept for each row equals the prediction for that row.
- if output_proba is set to True, for the case of logistic regression, the output prediction and its corresponding explanation will be proba instead of the classification result
- Pyspark Ensemble Tree Explainer Transformer
from transparency.spark.prediction.explainer.tree import EnsembleTreeExplainTransformer
EnsembleTreeExplainTransformer(predictionView=predictions_view,
featureImportanceView=features_importance_view,
modelPath=rf_model_path,
label=label_column,
dropPathColumn=True,
isClassification=classification,
ensembleType=ensemble_type)
-
Path to load model
modelPath
-
Supported
ensembleType
dct
gbt
rf
xgboost4j
-
The feature importance extracted from Apache Spark Model Meta Data.
featureImportanceView
Reference this python script :testutil.common.get_feature_importance
Feature_Index
Feature
Original_Feature
Importance
-
The transformer append 3 main column to the prediction view
- contrib_column ==>
f"{prediction_{label_column}_contrib
: array of contributions - contrib_column_sum ==>
f"{contrib_column}_sum"
- contrib_column_intercept ==>
f"{contrib_column}_intercept"
- contrib_column ==>
- Pyspark Generalized Linear Model (GLM) Explainer Transformer
from transparency.spark.prediction.explainer.tree import GLMExplainTransformer
GLMExplainTransformer(predictionView=predictions_view,
coefficientView=coefficients_view,
linkFunctionType=link_function_type,
label=label_column, nested=True,
calculateSum=True,
family=family,
variancePower=variance_power,
linkPower=link_power)
-
Supported
linkFunctionType
logLink
powerHalfLink
identityLink
logitLink
inverseLink
otherPowerLink
-
The feature coefficient extracted from Apache Spark Model Meta Data.
coefficientView
Reference this python script :testutil.common.get_feature_coefficients
Feature_Index
Feature
Original_Feature
Coefficient
-
The transformer append 3 main column to the prediction view
- contrib_column ==>
f"{prediction_{label_column}_contrib
: array of contributions - contrib_column_sum ==>
f"{contrib_column}_sum"
- contrib_column_intercept ==>
f"{contrib_column}_intercept"
- contrib_column ==>
Example Notebooks
- Python (Scikit-Learn) Ensemble Tree Explain Example: https://github.com/imanbio/transparency/blob/master/examples/notebooks/python/python_ensemble_tree_explainer_samples.ipynb
- Python (Scikit-Learn) Generalized Linear Model Explain Example: https://github.com/imanbio/transparency/blob/master/examples/notebooks/python/python_glm_explainer_samples.ipynb
- PySpark GLM Explain Example: https://github.com/imanbio/transparency/blob/master/examples/notebooks/spark/pyspark_glm_explain.ipynb
- PySpark Random Forest Explain Example: https://github.com/imanbio/transparency/blob/master/examples/notebooks/spark/pyspark_random_forest_explain.ipynb
Authors
- Iman Haji https://www.linkedin.com/in/imanhaji
- Imran Qureshi https://www.linkedin.com/in/imranq2/
- Alvin Henrick https://www.linkedin.com/in/alvinhenrick/
License
Apache License Version 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file transparency-0.0.9.tar.gz
.
File metadata
- Download URL: transparency-0.0.9.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 835da70418f6cb6e84e04dc7e6d1892f80c750b697daea817182101510e2eea0 |
|
MD5 | 66900ebc5d38fb60d87e9c11bbce7a4a |
|
BLAKE2b-256 | 9dde83481654bb19d5dc6e3f5dfa38dc8b2e3b6225d0d48ea3901267c2075ff0 |