Skip to main content

Library for executable ML pipelines represented by KGs.

Project description

ExeKGLib

PyPI Python Poetry Code style: black License

Python library for conveniently constructing and executing Machine Learning (ML) pipelines represented by Knowledge Graphs (KGs). It features a coding interface and a CLI, and allows the user to:

  1. Construct an ML pipeline that gets a CSV as input and processes the data using any of the available tasks and methods.
  2. Save the constructed pipeline as a KG in Turtle format.
  3. Execute the generated KG.

The coding interface is demonstrated with three sample Python files. The pipelines represented by the generated sample KGs are briefly explained below:

  1. ML pipeline: Loads features and labels from an input CSV dataset, splits the data, trains and tests a k-NN model, and visualizes the prediction errors.
  2. Statistics pipeline: Loads a feature from an input CSV dataset, normalizes it, and plots its values (before and after normalization) using a scatter plot.
  3. Visualization pipeline: Loads a feature from an input CSV dataset and plots its values using a line plot.

Under the hood, ExeKGLib uses well-known Python libraries for data processing and visualization and performing predictions such as pandas, matplotlib, and scikit-learn.

ExeKGLib is described in the following paper published as part of ESWC 2023:
Klironomos A., Zhou B., Tan Z., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E. ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics

Detailed information (installation, documentation etc.) about ExeKGLib can be found in its website and basic information is shown below.

Installation

To install, run pip install exe-kg-lib.

For detailed installation instructions, refer to the installation page of ExeKGLib's website.

Ready-to-use ML-related tasks and methods

Click to expand
KG schema (abbreviation) Task Method Properties Input (data structure) Output (data structure) Implemented by Python class
Machine Learning (ml) Train KNNTrain - DataInTrainX (Matrix or Vector)
DataInTrainY (Matrix or Vector)
DataOutPredictedValueTrain (Matrix or Vector)
DataOutTrainModel (SingleValue)
TrainKNNTrain
Machine Learning (ml) Train MLPTrain - DataInTrainX (Matrix or Vector)
DataInTrainY (Matrix or Vector)
DataOutPredictedValueTrain (Matrix or Vector)
DataOutTrainModel (SingleValue)
TrainMLPTrain
Machine Learning (ml) Train LRTrain - DataInTrainX (Matrix or Vector)
DataInTrainY (Matrix or Vector)
DataOutPredictedValueTrain (Matrix or Vector)
DataOutTrainModel (SingleValue)
TrainLRTrain
Machine Learning (ml) Test KNNTest - DataInTestModel (SingleValue)
DataInTestX (Matrix or Vector)
DataOutPredictedValueTest (Matrix or Vector) TestKNNTest
Machine Learning (ml) Test MLPTest - DataInTestModel (SingleValue)
DataInTestX (Matrix or Vector)
DataOutPredictedValueTest (Matrix or Vector) TestMLPTest
Machine Learning (ml) Test LRTest - DataInTestModel (SingleValue)
DataInTestX (Matrix or Vector)
DataOutPredictedValueTest (Matrix or Vector) TestLRTest
Machine Learning (ml) PerformanceCalculation PerformanceCalculationMethod - DataInTrainRealY (Matrix or Vector)
DataInTrainPredictedY (Matrix or Vector)
DataInTestPredictedY (Matrix or Vector)
DataInTestRealY (Matrix or Vector)
DataOutMLTestErr (Vector)
DataOutMLTrainErr (Vector)
PerformanceCalculationPerformanceCalculationMethod
Machine Learning (ml) Concatenation ConcatenationMethod - DataInConcatenation (list of Vector) DataOutConcatenatedData (Matrix) ConcatenationConcatenationMethod
Machine Learning (ml) DataSplitting DataSplittingMethod - DataInDataSplittingX (Matrix or Vector)
DataInDataSplittingY (Matrix or Vector)
DataOutSplittedTestDataX (Matrix or Vector)
DataOutSplittedTrainDataY (Matrix or Vector)
DataOutSplittedTrainDataX (Matrix or Vector)
DataOutSplittedTestDataY (Matrix or Vector)
DataSplittingDataSplittingMethod
Visualization (visu) CanvasTask CanvasMethod hasCanvasName (string)
hasLayout (string)
- - CanvasTaskCanvasMethod
Visualization (visu) PlotTask LineplotMethod hasLineStyle (string)
hasLineWidth (int)
hasLegendName (string)
DataInVector (Vector) - PlotTaskLineplotMethod
Visualization (visu) PlotTask ScatterplotMethod hasLineStyle (string)
hasLineWidth (int)
hasScatterSize (int)
hasLegendName (string)
DataInVector (Vector) - PlotTaskScatterplotMethod
Statistics (stats) TrendCalculationTask TrendCalculationMethod - DataInTrendCalculation (Vector) DataOutTrendCalculation (Vector) TrendCalculationTaskTrendCalculationMethod
Statistics (stats) NormalizationTask NormalizationMethod - DataInNormalization (Vector) DataOutNormalization (Vector) NormalizationTaskNormalizationMethod
Statistics (stats) ScatteringCalculationTask ScatteringCalculationMethod - DataInScatteringCalculation (Vector) DataOutScatteringCalculation (Vector) ScatteringCalculationTaskScatteringCalculationMethod

Usage

Creating an ML pipeline

  • Via code: See the provided examples. To fetch them to your working directory for easy access, run typer exe_kg_lib.cli.main run get-examples.
  • Step-by-step via CLI: Run typer exe_kg_lib.cli.main run create-pipeline.

Executing an ML pipeline

  • Via code: See example code.
  • Via CLI: Run typer exe_kg_lib.cli.main run run-pipeline <pipeline_path>.

Adding a new ML-related task and method

To perform this type of ExeKGLib extension, there are 3 required steps:

  1. Selection of a relevant bottom-level KG schema (Statistics, ML, or Visualization) according to the type of the new task and method.
  2. Addition of new semantic components (entities, properties, etc) to the selected KG schema.
  3. Addition of a Python class to the corresponding module of exe_kg_lib.classes.tasks package.

For steps 2 and 3, refer to the relevant page of ExeKGLib's website.

Documentation

See the Code Reference and Development sections of the ExeKGLib's website.

External resources

KG schemata

The above KG schemata are included in the ExeKGOntology repository.

Dataset used in code examples

The dataset was generated using the sklearn.datasets.make_classification() function of the scikit-learn Python library.

License

ExeKGLib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exe_kg_lib-2.1.2.tar.gz (39.7 kB view details)

Uploaded Source

Built Distribution

exe_kg_lib-2.1.2-py3-none-any.whl (49.4 kB view details)

Uploaded Python 3

File details

Details for the file exe_kg_lib-2.1.2.tar.gz.

File metadata

  • Download URL: exe_kg_lib-2.1.2.tar.gz
  • Upload date:
  • Size: 39.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1014-azure

File hashes

Hashes for exe_kg_lib-2.1.2.tar.gz
Algorithm Hash digest
SHA256 6c4289b01e6732d31d0d39a1f4c9b9bc71744d37be8215c507257555447ff6c4
MD5 fadc0532b7249b9b814abfa3a48cfa57
BLAKE2b-256 1d65170acb4181bdc28533d31eb3d2c982ce76887d9e7b8522605dd11074633a

See more details on using hashes here.

File details

Details for the file exe_kg_lib-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: exe_kg_lib-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 49.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1014-azure

File hashes

Hashes for exe_kg_lib-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 11a7c549b7fb5b89db4463f1e89cc3ae4b0d366fa8c528c658792bda5dd64451
MD5 81ad1ad2072a55a13309be2c46f81918
BLAKE2b-256 48ee6e56ce2b1f829e07e9c49e8a0fe406f4fb978747d45523aeee9244ce0db6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page