Skip to main content

Library for executable ML pipelines represented by KGs.

Project description

ExeKGLib: A Python Library for Knowledge Graphs-Empowered Machine Learning Analytics ๐Ÿš€

PyPI Python Poetry Code style: black License

ExeKGLib is a Python library that simplifies the construction and execution of Machine Learning (ML) pipelines represented by Executable Knowledge Graphs (ExeKGs). It features a coding interface and a CLI, and allows the user to:

๐ŸŒŸ Features

  1. ๐Ÿ”จ Construct data analytics pipelines that take tabular files (e.g. CSV) as input and process the data using a variety of available tasks and methods.
  2. ๐Ÿ’พ Save the constructed pipelines as ExeKGs in RDF Turtle format.
  3. โ–ถ๏ธ Execute the generated ExeKGs.

๐ŸŒŸ Key Benefits of ExeKGLib

  1. ๐Ÿš€ No-code ML Pipeline Creation: With ExeKGLib, the user can specify the pipeline's structure and the operations to be performed using a simple JSON file (see Creating an ML pipeline), which is then automatically converted to an ExeKG. This ExeKG can be executed to perform the specified operations on the input data (see Executing an ML pipeline).
  2. ๐Ÿ“ฆ Batch Pipeline Creation and Edit: ExeKGLib allows users to create and edit pipelines in a batch fashion through its simple coding interface (see Creating an ML pipeline and Editing an ML pipeline). This enables automatic creation of multiple pipelines as ExeKGs, which can then be queried and analyzed.
  3. ๐Ÿ”— Linked Open Data Integration: ExeKGLib is a tool that leverages linked open data (LOD) in several significant ways:
    • ๐Ÿ“š Pipeline Creation Guidance: It helps guide the user through the pipeline creation process. This is achieved by using a predefined hierarchy of tasks, along with their compatible inputs, outputs, methods, and method parameters (see available tasks and methods).
    • ๐Ÿง  Enhancing User Understanding: It enhances the user's understanding of Data Science and the pipeline's functionality. This is achieved by linking the generated pipelines to Knowledge Graph (KG) schemata that encapsulate various Data Science concepts (see KG schemata).
    • โœ… Validation of ExeKGs: It validates the generated ExeKGs to ensure their executability.
    • ๐Ÿ”„ Automatic Conversion and Execution: It automatically converts the ExeKGs to Python code and executes them.

Under the hood, ExeKGLib uses well-known Python libraries for data processing and visualization and performing predictions such as pandas, matplotlib, and scikit-learn.

ExeKGLib is described in the following paper published as part of ESWC 2023:
Klironomos A., Zhou B., Tan Z., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E. ExeKGLib: Knowledge Graphs-Empowered Machine Learning Analytics

Detailed information (installation, documentation etc.) about ExeKGLib can be found in its website and basic information is shown below.

๐Ÿ“ฆ Installation

To install, run pip install exe-kg-lib.

For detailed installation instructions, refer to the installation page of ExeKGLib's website.

๐Ÿš€ Getting started

We provide example Python and JSON files that can be used to create the following pipelines:

  1. ๐Ÿง  ML pipeline:
    1. MLPipelineSimple: Loads a CSV dataset, concatenates selected features, splits the data into training and testing sets, trains a Support Vector Classifier (SVC) model, tests the model, calculates performance metrics (accuracy, F1 score, precision, and recall), and visualizes the results in bar plots.
    2. MLPipelineCrossValidation: An extended version of MLPipelineSimple that adds a data splitting step for Stratified K-Fold Cross-Validation. Then, it trains and tests the model using the cross-validation technique and visualizes the validation and test F1 scores in bar plots.
    3. MLPipelineModelSelection: A modified version of MLPipelineSimple that replaces the training step with a model selection step. Rather than using a fixed model, this pipeline involves training and cross-validating a Support Vector Classifier (SVC) model with various hyperparameters to optimize performance.
  2. ๐Ÿ“Š Statistics pipeline:
    • StatsPipeline: Loads a specific feature from a CSV dataset, calculates its mean and standard deviation, and visualizes the feature's values using a line plot and the calculated statistics using a bar plot.
  3. ๐Ÿ“ˆ Visualization pipeline:
    • VisuPipeline: The pipeline loads two numerical features from a CSV dataset and visualizes each feature's values using separate line plots.

๐Ÿ’ก Tip: To fetch the examples into your working directory for easy access, run typer exe_kg_lib.cli.main run get-examples.

๐Ÿ—’๏ธ Note: The naming convention for output names (used as inputs for subsequent tasks) in .json files can be found in exe_kg_lib/utils/string_utils.py. Look for TASK_OUTPUT_NAME_REGEX.

๐Ÿงช Supported ML-related tasks and methods

See relevant website page.

๐Ÿ› ๏ธ Usage

๐Ÿš€ Creating an ML pipeline

๐Ÿ’ป Via code

See the Python files in the provided examples.

๐Ÿ“„ Using JSON

Run typer exe_kg_lib.cli.main run create-pipeline <json_path> after replacing <json_path> to point to a pipeline's JSON file. See the provided example JSONs

๐Ÿ—’๏ธ Note: Replace input_data_path with the path to a dataset and output_plots_dir with the directory path where the plots will be saved.

๐Ÿ–ฅ๏ธ Step-by-step via CLI

Run typer exe_kg_lib.cli.main run create-pipeline.

๐Ÿš€ Editing an ML pipeline

๐Ÿ’ป Via code

See the provided sample script.

๐Ÿš€ Executing an ML pipeline

๐Ÿ’ป Via code

See example code.

๐Ÿ–ฅ๏ธ Via CLI

Run typer exe_kg_lib.cli.main run run-pipeline <pipeline_path>. The pipeline_path can either be a .ttl or .json file.

๐Ÿ“ Adding a new ML-related task and method

For detailed guidelines, refer to the relevant page of ExeKGLib's website.

In summary, these are the steps:

  1. Selecting a bottom-level KG schema (Statistics, ML, or Visualization) based on the type of the new task and method.
  2. Adding new semantic components (entities, properties, etc.) to the selected KG schema and the corresponding SHACL shapes graph.
  3. Modifying the Python code in the corresponding file of exe_kg_lib.classes.tasks package.

๐Ÿ“š Documentation

See the Code Reference and Development sections of the ExeKGLib's website.

๐ŸŒ External resources

๐Ÿ“œ KG schemata

The above KG schemata are included in the ExeKGOntology repository.

๐Ÿ“Š Dataset used in code examples

The dataset was generated using the sklearn.datasets.make_classification() function of the scikit-learn Python library.

๐Ÿ“œ License

ExeKGLib is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exe_kg_lib-2.3.0.tar.gz (62.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exe_kg_lib-2.3.0-py3-none-any.whl (78.7 kB view details)

Uploaded Python 3

File details

Details for the file exe_kg_lib-2.3.0.tar.gz.

File metadata

  • Download URL: exe_kg_lib-2.3.0.tar.gz
  • Upload date:
  • Size: 62.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for exe_kg_lib-2.3.0.tar.gz
Algorithm Hash digest
SHA256 7f4cff82b9a059fd2898b0a2f73408e27c63cbd15f26db18f6a39e50d71704f4
MD5 2b07ec68ee886a47184dade8c33e934f
BLAKE2b-256 fd9dc9633e15f3ce52b9376f607306cd5d58a723ea152a23c6fc79c44ee513c6

See more details on using hashes here.

File details

Details for the file exe_kg_lib-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: exe_kg_lib-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 78.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for exe_kg_lib-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f0343e31fcd2d7ab5377870a3e23263fc70ad8f4ae0b654201da8e3b1d8188a
MD5 c25272e73ffa5bb6fdddf8ab10996f74
BLAKE2b-256 a1da9cf824309455f6303adc08d8393b7e5ed2e88c8483a388bb99575c69cf9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page