An interpretable machine learning pipeline over knowledge graphs
Project description
InterpretME: A Tool for Interpretations of Machine Learning Models Over Knowledge Graphs
InterpretME is a publicly available library. It includes a pipeline for enhancing the interpretability of machine learning models over knowledge graphs, an ontology to describe the main characteristics of trained machine learning models, and the InterpretME knowledge graph. The InterpretME KG assists the users in clarification and ease the interpretation of the model's predictions of a particular entity aligned with SHACL validation results. InterpretME uses state-of-the-art machine learning models and interpretable tools. InterpretME evaluates the SHACL constraints over the nodes of the input KGs and generates a validation report per constraint and target entity. It helps the user to understand the decisions of the predictive models and also provides a platform for interpretability.
Installation
InterpretME is OS independent, i.e., you can run it on Linux, Mac, and Windows. However, the current version only supports Python 3.8 and 3.9. You can install InterpretME from PyPI via pip:
pip install InterpretME
Running the InterpretME Pipeline
from InterpretME import pipeline
pipeline(path_config, lime_results, server_url, username, password, sampling, cv, imp_features, test_split, model)
pipeline()
executes the whole pipeline; including extracting data and metadata from the input KGs, validating SHACL constraints, preprocessing the data and running predictive models.
InterpretME aims at collecting metadata at each step of pipeline.
The current version of InterpretME resorts to interpretable surrogate tools like LIME [1].
The user can provide a path to store the LIME results.
Even model performance metrics like accuracy, precision etc. are recorded as metadata.
The RDF mapping language (RML) is used to define mappings for the metadata collected from the predictive pipeline in order to integrate them into the InterpretME KG.
The RML mappings are used by the SDM-RDFizer [2], an efficient RML engine for creating knowledge graphs, to semantify the metadata.
The function pipeline()
returns results from the pipeline which are used later in traceability of a target entity.
Parameters:
path_config
- Path to the input configuration file (JSON) for Input KGlime_results
- Path to store LIME results in HTML formatserver_url
- URL of InterpretME KGusername
- Username to upload data to InterpretME KGpassword
- Password to upload data to InterpretME KGsampling
- Optional; sampling strategy to use (undersampling or oversampling)cv
- Optional; number of cross-validation folds required while performing stratified shuffle splitimp_features
- Optional; number of important featurestest_split
- Optional; splitting of training and testing datasetmodel
- Optional; model used to perform stratified shuffle split (Random forest, Adaboost classifier, Gradient boosting classifier)
Returns:
A dictionary that captures all the results of the trained predictive model stored as objects which can be used for further analysis for e.g., plots.sampling()
.
Plots
InterpretME offers plots to understand and visualize the characteristics of the trained predictive model. The following plot functions are defined in InterpretME.
Sampling
from InterpretME import plots
plots.sampling(results, path)
plots.sampling()
saves the target class distribution plot after applying the sampling strategy.
Parameters:
results
- Results dictionary obtained frompipeline()
path
- Path where to store the output plot
Feature Importance
from InterpretME import plots
plots.feature_importance(results,path)
plots.feature_importance()
Creates a bar plot of important features with feature weights.
Parameters:
results
- Results dictionary obtained frompipeline()
path
- Path where to store the output plot
Decision Trees
from InterpretME import plots
plots.decision_trees(results,path)
plots.decision_trees()
plots the decision trees generated from the predictive model.
Parameters:
results
- Results dictionary obtained frompipeline()
path
- Path where to store the output plot
Decision Trees with Constraint Validation
from InterpretME import plots
plots.constraints_decision_trees(results,path)
plot.constraints_decision_trees()
plots decision trees including SHACL constraint validation results.
Parameters:
results
- Results dictionary obtained frompipeline()
path
- Path where to store the output plot
Federated Querying
InterpretME assists the user in interpreting the predictive model via its ability to trace all characteristics of a target entity from the input KG and InterpretME KG. This is achieved by using the federated query engine Detrusty [3]. Using this module, the user's questions can be answered via SPARQL queries.
Configuration
from InterpretME.federated_query_engine import configuration
configuration(interpretme_endpoint, input_endpoint)
DeTrusty relies on collected metadata about the KGs.
configuration()
collects the required metadata and stores it in a file as well as returning it.
Parameters:
interpretme_endpoint
- URL of the InterpretME KGinput_endpoint
- URL of the input KG
Returns:
An instance of DeTrusty.Molecule.MTManager.Config
that holds the metadata collected from the input KG and the InterpretME KG.
This object is to be used for the parameter config
of the method federated()
.
Querying
from InterpretME.federated_query_engine import federated
federated(input_query, config)
Parameters:
input_query
- SPARQL query to answer the user's questionconfig
- The configuration object holding the metadata about the KGs to query (generated byconfiguration()
)
Returns: A Python dictionary following the SPARQL protocol with the query result.
References
[1] Marco Ribeiro, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM. 2016. DOI: 10.1145/2939672.2939778.
[2] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana and M.-E. Vidal. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In: CIKM ’20:Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM, New York, NY,USA, 2020. DOI: 10.1145/3340531.3412881.
[3] P.D. Rohde. DeTrusty v0.6.1, August 2022. DOI: 10.5281/zenodo.6998001.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file InterpretME-1.3.2.tar.gz
.
File metadata
- Download URL: InterpretME-1.3.2.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e47cc7ab321dfad54b53a63f50a601e1252db0e54e7331bc30f36d63022dde7 |
|
MD5 | 57991df8feb0b4a62b7dbc906a8aaf0e |
|
BLAKE2b-256 | 7de9f8b9f9365939d7f7453afe8850ef02f3d9772c81e81378dd1fb3afc21867 |
Provenance
File details
Details for the file InterpretME-1.3.2-py3-none-any.whl
.
File metadata
- Download URL: InterpretME-1.3.2-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c470e8631a1dcd8b3c08c66a30b00f0a5a8bd86ce6ababd4f868e30c3490739b |
|
MD5 | cb387de5db4e2641281f48d9b5e6f8aa |
|
BLAKE2b-256 | 13a0efbf9aeae4d2387e8ae6421ca372229d21b1049b85221116ba7d64687d8b |