Skip to main content

Library for executable ML pipelines represented by KGs.

Project description

ExeKGLib

PyPI Python Poetry Code style: black License

Python library for conveniently constructing and executing Machine Learning (ML) pipelines represented by Knowledge Graphs (KGs).

Detailed information (installation, documentation etc.) about the library can be found in its website and basic information is shown below.

Overview

The functionality of this Python library can be divided in the below two parts:

  1. Executable KG construction: An executable KG representing an ML pipeline is constructed as per user's input (programmatically or via CLI) based on the KG schemas. The construction is done by sequentially creating pairs of instances of ds:AtomicTask and ds:AtomicMethod sub-classes, and their properties. The definition of these sub-classes can be found in the bottom-level KG schemas. After each KG component is built, it is validated using the KG schemas and added to an RDFLib Graph object. The KG is finally saved in Turtle format.
  2. ML pipeline execution: The executable KG is parsed using RDFLib and queried using SPARQL to retrieve its ML pipeline. The pipeline's ordered tasks are sequentially mapped to Python objects that include an implemented run_method() Python method which is then invoked. This is as an abstract method of the Task class that is implemented by its bottom-level children classes.

The different implementations of run_method() correspond to the ds:AtomicMethod bottom-level sub-classes that are defined in the Visualization, Statistics, and ML KG schemas. The method categories are described below.

  1. Visualization: This is a set of methods for visualization, including two types: (1) The plot canvas methods that define the plot size and layout. (2) The various kinds of plot methods (line plot, scatter plot, bar plot, etc.). These methods use matplotlib to visualize data.
  2. Statistics and Feature Engineering: This includes methods for statistical analysis and feature engineering like IQR calculation, mean and std-deviation calculation, etc., which can then form complex methods like outlier detection method and normalization method.
  3. Machine Learning: This is a group of methods that support ML algorithms like Linear Regression, MLP, and k-NN and helper functions that perform e.g. data splitting and ML model performance calculation.

This library is part of the following paper submitted to ESWC 2023:
Klironomos A., Zhou B., Tan Z., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E.: ExeKGLib: A Python Library for Machine Learning Analytics based on Knowledge Graphs

Getting started

The library is available as a PyPi package.

To download, run pip install exe-kg-lib.

Usage

Fetching examples to working directory

Run typer exe_kg_lib.cli.main run get-examples.

Creating an ML pipeline

Via CLI

  1. Run typer exe_kg_lib.cli.main run create-pipeline.
  2. Follow the input prompts.

Via code

See the provided examples.

Executing an ML pipeline

Run typer exe_kg_lib.cli.main run run-pipeline <pipeline_path>.

Installation

See the installation page of the library's website.

Adding a new ML-related task and method

To perform this type of library extension, there are 3 required steps:

  1. Selection of a relevant bottom-level KG schema (Statistics, ML, or Visualization) according to the type of the new task and method.
  2. Addition of new semantic components (entities, properties, etc) to the selected KG schema.
  3. Addition of a Python class to the corresponding module of exe_kg_lib.classes.tasks package.

For steps 2 and 3, refer to the relevant page of the library's website.

Documentation

See the Code Reference and Development sections of the library's website.

External resources

Top-level KG schemas

Bottom-level KG schemas

The above KG schemas are included in the ExeKGOntology repository.

Dataset used in code examples

This dataset (located in exe_kg_lib/examples/data/dummy_data.csv) was generated using the sklearn.datasets.make_classification() function of the scikit-learn Python library.

License

ExeKGLib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exe_kg_lib-1.1.2.tar.gz (35.6 kB view hashes)

Uploaded Source

Built Distribution

exe_kg_lib-1.1.2-py3-none-any.whl (44.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page