Library for executable ML pipelines represented by KGs.
Project description
ExeKGLib
Python library for conveniently constructing and executing Machine Learning (ML) pipelines represented by Knowledge Graphs (KGs). It features a coding interface and a CLI, and allows the user to:
- Construct an ML pipeline that gets a CSV as input and processes the data using any of the available tasks and methods.
- Save the constructed pipeline as a KG in Turtle format.
- Execute the generated KG.
The coding interface is demonstrated with three sample Python files. The pipelines represented by the generated sample KGs are briefly explained below:
- ML pipeline: Loads features and labels from an input CSV dataset, splits the data, trains and tests a k-NN model, and visualizes the prediction errors.
- Statistics pipeline: Loads a feature from an input CSV dataset, normalizes it, and plots its values (before and after normalization) using a scatter plot.
- Visualization pipeline: Loads a feature from an input CSV dataset and plots its values using a line plot.
Under the hood, ExeKGLib uses well-known Python libraries for data processing and visualization and performing predictions such as pandas, matplotlib, and scikit-learn.
ExeKGLib is described in the following paper published as part of ESWC 2023:
Klironomos A., Zhou B., Tan Z., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E. ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics
Detailed information (installation, documentation etc.) about ExeKGLib can be found in its website and basic information is shown below.
Installation
To install, run pip install exe-kg-lib
.
For detailed installation instructions, refer to the installation page of ExeKGLib's website.
Ready-to-use ML-related tasks and methods
Click to expand
KG schema (abbreviation) | Task | Method | Properties | Input (data structure) | Output (data structure) | Implemented by Python class |
---|---|---|---|---|---|---|
Machine Learning (ml) | Train | KNNTrain | - | DataInTrainX (Matrix or Vector) DataInTrainY (Matrix or Vector) |
DataOutPredictedValueTrain (Matrix or Vector) DataOutTrainModel (SingleValue) |
TrainKNNTrain |
Machine Learning (ml) | Train | MLPTrain | - | DataInTrainX (Matrix or Vector) DataInTrainY (Matrix or Vector) |
DataOutPredictedValueTrain (Matrix or Vector) DataOutTrainModel (SingleValue) |
TrainMLPTrain |
Machine Learning (ml) | Train | LRTrain | - | DataInTrainX (Matrix or Vector) DataInTrainY (Matrix or Vector) |
DataOutPredictedValueTrain (Matrix or Vector) DataOutTrainModel (SingleValue) |
TrainLRTrain |
Machine Learning (ml) | Test | KNNTest | - | DataInTestModel (SingleValue) DataInTestX (Matrix or Vector) |
DataOutPredictedValueTest (Matrix or Vector) | TestKNNTest |
Machine Learning (ml) | Test | MLPTest | - | DataInTestModel (SingleValue) DataInTestX (Matrix or Vector) |
DataOutPredictedValueTest (Matrix or Vector) | TestMLPTest |
Machine Learning (ml) | Test | LRTest | - | DataInTestModel (SingleValue) DataInTestX (Matrix or Vector) |
DataOutPredictedValueTest (Matrix or Vector) | TestLRTest |
Machine Learning (ml) | PerformanceCalculation | PerformanceCalculationMethod | - | DataInTrainRealY (Matrix or Vector) DataInTrainPredictedY (Matrix or Vector) DataInTestPredictedY (Matrix or Vector) DataInTestRealY (Matrix or Vector) |
DataOutMLTestErr (Vector) DataOutMLTrainErr (Vector) |
PerformanceCalculationPerformanceCalculationMethod |
Machine Learning (ml) | Concatenation | ConcatenationMethod | - | DataInConcatenation (list of Vector) | DataOutConcatenatedData (Matrix) | ConcatenationConcatenationMethod |
Machine Learning (ml) | DataSplitting | DataSplittingMethod | - | DataInDataSplittingX (Matrix or Vector) DataInDataSplittingY (Matrix or Vector) |
DataOutSplittedTestDataX (Matrix or Vector) DataOutSplittedTrainDataY (Matrix or Vector) DataOutSplittedTrainDataX (Matrix or Vector) DataOutSplittedTestDataY (Matrix or Vector) |
DataSplittingDataSplittingMethod |
Visualization (visu) | CanvasTask | CanvasMethod | hasCanvasName (string) hasLayout (string) |
- | - | CanvasTaskCanvasMethod |
Visualization (visu) | PlotTask | LineplotMethod | hasLineStyle (string) hasLineWidth (int) hasLegendName (string) |
DataInVector (Vector) | - | PlotTaskLineplotMethod |
Visualization (visu) | PlotTask | ScatterplotMethod | hasLineStyle (string) hasLineWidth (int) hasScatterSize (int) hasLegendName (string) |
DataInVector (Vector) | - | PlotTaskScatterplotMethod |
Statistics (stats) | TrendCalculationTask | TrendCalculationMethod | - | DataInTrendCalculation (Vector) | DataOutTrendCalculation (Vector) | TrendCalculationTaskTrendCalculationMethod |
Statistics (stats) | NormalizationTask | NormalizationMethod | - | DataInNormalization (Vector) | DataOutNormalization (Vector) | NormalizationTaskNormalizationMethod |
Statistics (stats) | ScatteringCalculationTask | ScatteringCalculationMethod | - | DataInScatteringCalculation (Vector) | DataOutScatteringCalculation (Vector) | ScatteringCalculationTaskScatteringCalculationMethod |
Usage
Creating an ML pipeline
- Via code: See the provided examples. To fetch them to your working directory for easy access, run
typer exe_kg_lib.cli.main run get-examples
. - Step-by-step via CLI: Run
typer exe_kg_lib.cli.main run create-pipeline
.
Executing an ML pipeline
- Via code: See example code.
- Via CLI: Run
typer exe_kg_lib.cli.main run run-pipeline <pipeline_path>
.
Adding a new ML-related task and method
To perform this type of ExeKGLib extension, there are 3 required steps:
- Selection of a relevant bottom-level KG schema (Statistics, ML, or Visualization) according to the type of the new task and method.
- Addition of new semantic components (entities, properties, etc) to the selected KG schema.
- Addition of a Python class to the corresponding module of
exe_kg_lib.classes.tasks
package.
For steps 2 and 3, refer to the relevant page of ExeKGLib's website.
Documentation
See the Code Reference and Development sections of the ExeKGLib's website.
External resources
KG schemata
- Top-level: Data Science
- Bottom-level: Visualization | Statistics | Machine Learning
The above KG schemata are included in the ExeKGOntology repository.
Dataset used in code examples
The dataset was generated using the sklearn.datasets.make_classification()
function of the scikit-learn Python library.
License
ExeKGLib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file exe_kg_lib-2.1.2.tar.gz
.
File metadata
- Download URL: exe_kg_lib-2.1.2.tar.gz
- Upload date:
- Size: 39.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1014-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c4289b01e6732d31d0d39a1f4c9b9bc71744d37be8215c507257555447ff6c4 |
|
MD5 | fadc0532b7249b9b814abfa3a48cfa57 |
|
BLAKE2b-256 | 1d65170acb4181bdc28533d31eb3d2c982ce76887d9e7b8522605dd11074633a |
File details
Details for the file exe_kg_lib-2.1.2-py3-none-any.whl
.
File metadata
- Download URL: exe_kg_lib-2.1.2-py3-none-any.whl
- Upload date:
- Size: 49.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1014-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11a7c549b7fb5b89db4463f1e89cc3ae4b0d366fa8c528c658792bda5dd64451 |
|
MD5 | 81ad1ad2072a55a13309be2c46f81918 |
|
BLAKE2b-256 | 48ee6e56ce2b1f829e07e9c49e8a0fe406f4fb978747d45523aeee9244ce0db6 |