An interpretable machine learning pipeline over knowledge graphs

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

InterpretME: A Tool for Interpretations of Machine Learning Models Over Knowledge Graphs

InterpretME is a publicly available library. It includes a pipeline for enhancing the interpretability of machine learning models over knowledge graphs, an ontology to describe the main characteristics of trained machine learning models, and the InterpretME knowledge graph. The InterpretME KG assists the users in clarification and ease the interpretation of the model's predictions of a particular entity aligned with SHACL validation results. InterpretME uses state-of-the-art machine learning models and interpretable tools. InterpretME evaluates the SHACL constraints over the nodes of the input KGs and generates a validation report per constraint and target entity. It helps the user to understand the decisions of the predictive models and also provides a platform for interpretability.

Installation

InterpretME is OS independent, i.e., you can run it on Linux, Mac, and Windows. However, the current version only supports Python 3.8 and 3.9. You can install InterpretME from PyPI via pip:

pip install InterpretME

Running the InterpretME Pipeline

from InterpretME import pipeline
pipeline(path_config, lime_results, server_url, username, password, sampling, cv, imp_features, test_split, model)

pipeline() executes the whole pipeline; including extracting data and metadata from the input KGs, validating SHACL constraints, preprocessing the data and running predictive models. InterpretME aims at collecting metadata at each step of pipeline. The current version of InterpretME resorts to interpretable surrogate tools like LIME [1]. The user can provide a path to store the LIME results. Even model performance metrics like accuracy, precision etc. are recorded as metadata. The RDF mapping language (RML) is used to define mappings for the metadata collected from the predictive pipeline in order to integrate them into the InterpretME KG. The RML mappings are used by the SDM-RDFizer [2], an efficient RML engine for creating knowledge graphs, to semantify the metadata. The function pipeline() returns results from the pipeline which are used later in traceability of a target entity.

Parameters:

path_config - Path to the input configuration file (JSON) for Input KG
lime_results - Path to store LIME results in HTML format
server_url - URL of InterpretME KG
username - Username to upload data to InterpretME KG
password - Password to upload data to InterpretME KG
sampling - Optional; sampling strategy to use (undersampling or oversampling)
cv - Optional; number of cross-validation folds required while performing stratified shuffle split
imp_features - Optional; number of important features
test_split - Optional; splitting of training and testing dataset
model - Optional; model used to perform stratified shuffle split (Random forest, Adaboost classifier, Gradient boosting classifier)

Returns: A dictionary that captures all the results of the trained predictive model stored as objects which can be used for further analysis for e.g., plots.sampling().

Plots

InterpretME offers plots to understand and visualize the characteristics of the trained predictive model. The following plot functions are defined in InterpretME.

Sampling

from InterpretME import plots
plots.sampling(results, path)

plots.sampling() saves the target class distribution plot after applying the sampling strategy.

Parameters:

results - Results dictionary obtained from pipeline()
path - Path where to store the output plot

Feature Importance

from InterpretME import plots
plots.feature_importance(results,path)

plots.feature_importance() Creates a bar plot of important features with feature weights.

Parameters:

results - Results dictionary obtained from pipeline()
path - Path where to store the output plot

Decision Trees

from InterpretME import plots
plots.decision_trees(results,path)

plots.decision_trees() plots the decision trees generated from the predictive model.

Parameters:

results - Results dictionary obtained from pipeline()
path - Path where to store the output plot

Decision Trees with Constraint Validation

from InterpretME import plots
plots.constraints_decision_trees(results,path)

plot.constraints_decision_trees() plots decision trees including SHACL constraint validation results.

Parameters:

results - Results dictionary obtained from pipeline()
path - Path where to store the output plot

Federated Querying

InterpretME assists the user in interpreting the predictive model via its ability to trace all characteristics of a target entity from the input KG and InterpretME KG. This is achieved by using the federated query engine Detrusty [3]. Using this module, the user's questions can be answered via SPARQL queries.

Configuration

from InterpretME.federated_query_engine import configuration
configuration(interpretme_endpoint, input_endpoint)

DeTrusty relies on collected metadata about the KGs. configuration() collects the required metadata and stores it in a file as well as returning it.

Parameters:

interpretme_endpoint - URL of the InterpretME KG
input_endpoint - URL of the input KG

Returns: An instance of DeTrusty.Molecule.MTManager.Config that holds the metadata collected from the input KG and the InterpretME KG. This object is to be used for the parameter config of the method federated().

Querying

from InterpretME.federated_query_engine import federated
federated(input_query, config)

Parameters:

input_query - SPARQL query to answer the user's question
config - The configuration object holding the metadata about the KGs to query (generated by configuration())

Returns: A Python dictionary following the SPARQL protocol with the query result.

References

[1] Marco Ribeiro, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM. 2016. DOI: 10.1145/2939672.2939778.

[2] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana and M.-E. Vidal. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In: CIKM ’20:Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM, New York, NY,USA, 2020. DOI: 10.1145/3340531.3412881.

[3] P.D. Rohde. DeTrusty v0.6.1, August 2022. DOI: 10.5281/zenodo.6998001.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

1.3.2

Jul 4, 2023

1.3.1

Jul 3, 2023

1.3.0

Jun 29, 2023

1.2.2

Mar 11, 2023

This version

1.2.1

Mar 10, 2023

1.2.0

Feb 28, 2023

1.1.1

Aug 26, 2022

1.1.0 yanked

Aug 25, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

InterpretME-1.2.1.tar.gz (38.2 kB view details)

Uploaded Mar 10, 2023 Source

Built Distribution

InterpretME-1.2.1-py3-none-any.whl (43.5 kB view details)

Uploaded Mar 10, 2023 Python 3

File details

Details for the file InterpretME-1.2.1.tar.gz.

File metadata

Download URL: InterpretME-1.2.1.tar.gz
Upload date: Mar 10, 2023
Size: 38.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for InterpretME-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`f58a1ef4e531715502b0e837b04ae30c25607322639124c9ca61395341a37886`
MD5	`61f9e7f574a4d895e49068b78bdc99e4`
BLAKE2b-256	`995e63272545309c19409d9dabb7c843280c96b6d313c2b10664c093c9891515`

See more details on using hashes here.

File details

Details for the file InterpretME-1.2.1-py3-none-any.whl.

File metadata

Download URL: InterpretME-1.2.1-py3-none-any.whl
Upload date: Mar 10, 2023
Size: 43.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for InterpretME-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cda9cad01dc3e4449a5aa2ed166361ac40f538e9d392808931c9fdf74f1276f2`
MD5	`e91598a2ed0ad122a0d7cfc905c25d6f`
BLAKE2b-256	`c48230f1ce1e783144a15108fff4b912cc9e899d59fabdd2f2ec9e5367ddfb63`