Library for executable ML pipelines represented by KGs.
Project description
ExeKGLib
Library for conveniently constructing and executing Machine Learning (ML) pipelines represented by Knowledge Graphs (KGs).
Overview
The functionality of this Python library can be divided in the below two parts:
- Executable KG construction: An executable KG representing an ML pipeline is constructed as per user's input (programmatically or via CLI) based on the KG schemas. The construction is done by sequentially creating pairs of instances of ds:AtomicTask and ds:AtomicMethod sub-classes, and their properties. The definition of these sub-classes can be found in the bottom-level KG schemas. After each KG component is built, it is validated using the KG schemas and added to an RDFLib
Graph
object. The KG is finally saved in Turtle format. - ML pipeline execution: The executable KG is parsed using RDFLib and queried using SPARQL to retrieve its ML pipeline. The pipeline's ordered tasks are sequentially mapped to Python objects that include an implemented
run_method()
Python method which is then invoked. This is as an abstract method of the Task class that is implemented by its bottom-level children classes.
The different implementations of run_method()
correspond to each of the Method's bottom level sub-classes that are defined in the Visualization, Statistics, and ML KG schemas. The method categories are described below.
- Visualization: This is a set of methods for visualization, including two types: (1) The plot canvas methods that define the plot size and layout. (2) The various kinds of plot methods (line plot, scatter plot, bar plot, etc.). These methods use matplotlib to visualize data.
- Statistics and Feature Engineering: This includes methods for statistical analysis and feature engineering like IQR calculation, mean and std-deviation calculation, etc., which can then form complex methods like outlier detection method and normalization method.
- Machine Learning: This is a group of methods that support ML algorithms like Linear Regression, MLP, and k-NN and helper functions that perform e.g. data splitting and ML model performance calculation.
Installation
See the installation page of the library's documentation site.
Usage
Creating an executable KG
Via CLI
- Run
python kg_construction.py
. - Follow the input prompts.
Via code
See the provided examples.
Executing a generated KG
Run python kg_execution.py [kg_file_path]
.
Adding a new ML-related task and method
To perform this type of library extension, there are 3 required steps:
- Selection of a relevant bottom-level KG schema (Statistics, ML, or Visualization) according to the type of the new task and method.
- Addition of new semantic components (entities, properties, etc) to the selected KG schema.
- Addition of a Python class to the corresponding module of
exe_kg_lib.classes.tasks
package.
For steps 2 and 3, refer to the relevant page of the library's documentation site.
External resources
Top-level KG schemas
Bottom-level KG schemas
The above KG schemas are included in the ExeKGOntology repository.
Breast Cancer Wisconsin (Diagnostic) Data Set
-
Creators: Dr. William H. Wolberg, W. Nick Street, and Olvi L. Mangasarian.
-
Copyright: This dataset is copyright of the above creators and licensed under CC BY-NC-SA 4.0 License.
-
Changes: The dataset file
examples/data/breast_cancer_data.csv
has the following changes compared to the original one.- The name of the file has been changed.
- In the column names, the spaces have been replaced with
_
. - A new column has been added (
diagnosis_binary
) containing1
for the rows that thediagnosis
column hasM
, and0
for the rest.
License
ExeKGLib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for exe_kg_lib-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0bdd2e6068c006755331b3ac225716776ae6d6b9c0fbabf1c44204886256d21 |
|
MD5 | 9a801ec60317ef069720feaf9eeac6af |
|
BLAKE2b-256 | ca4b396a341c3d3a70b761b061fa9575005f6ee9ae7331771315d8bea729bc81 |