Skip to main content

STELAR KLMS deployment tooling

Project description

Overview

This repository provides instructions, documentation, and examples regarding deployment of the Knowledge Lake Management System (KLMS) developed by the STELAR project. The STELAR KLMS supports and facilitates a holistic approach for FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready (high-quality, reliably labeled) data. It allows to (semi-)automatically turn a raw data lake into a knowledge lake by: (a) enhancing the data lake with a knowledge layer; and (b) developing and integrating a set of data management tools and workflows. The knowledge layer comprises: (a) a data catalog that offers automatically enhanced metadata for the raw data assets in the lake; and (b) a knowledge graph that semantically describes and interlinks these data assets using suitable domain ontologies and vocabularies. The provided STELAR tools and workflows offer novel functionalities for: (a) data discovery and quality management; (b) data linking and alignment, and (c) data annotation and synthetic data generation.

alt text

Deployment CLI

This repository includes stelarctl, a CLI for initializing KLMS lake workspaces, generating product fullspecs, and running cluster preparation preflight checks. See docs/stelarctl.md for the supported commands and deployment workflow.

The intended operator install path is:

pipx install stelar-deploy

KLMS core components

  • STELAR API. The main entry point to the KLMS system, exposing RESTful endpoints for managing and searching resources in the KLMS. Houses the core services of the KLMS, including user management, dataset management, metadata extraction, and search functionalities, task and workflow invocation. Exposes a GUI for interacting with the KLMS system, the STELAR KLMS Console, supporting the full spectrum of KLMS functionalities.

  • Data Catalog of datasets in KLMS, deployed as a CKAN instance, mainly utilized under the hood.

  • Keycloak is used for Identity and Access Management;

  • PostgreSQL serves as the main relational database backbone for storing KLMS metadata and user information.

  • Ontop a knowledge graph, employing mappings from the database to a virtual RDF graph according to the KLMS ontology.

  • QUAY Registry, via a custom distribution for managing STELAR Data Analysis Tools container images.

  • MinIO serves as a storage layer for the data assets tracked by the Data Catalog as well as for tool images.

  • Redis is used as an in-memory data structure store for caching.

  • LLM-powered Semantic Dataset Search Facility is a tool for enhancing dataset search capabilities using large language models. It is integrated into the KLMS Console and implemented as a FastAPI service under the hood.

  • STELAR Resource Previewer is a streamlit-based tool for visualizing and exploring the resources of the data catalog artifacts. It is exposed via the central ingress controller of the KLMS deployment and embedded in the KLMS Console.

  • STELAR Profile Visualizer is a tool for visualizing and exploring the profiles of the data catalog artifacts. It is also exposed via the central ingress controller of the KLMS deployment and embedded in the KLMS Console.

The STELAR KLMS supports two alternative workflow engines:

  • In its Community Edition, it supports Apache Airflow, which is a very popular open-source platform for this purpose.

  • In its Professional and Enterprise editions, it supports the RapidMiner Studio & AI Hub, which is a widely used commercial platform for machine learning and data science workflows.

While both options have been well-tested in regards with their compatibility, the range of open-source tools and systems STELAR can integrate with is limitless. Integration can be achieved by the STELAR API directly or indirectly through the STELAR Python SDK.

Access to the STELAR API is provided either directly via its RESTful endpoints or via the STELAR Python SDK, a client library for interacting with the STELAR API. The SDK is available via PyPI and can be installed via pip:

pip install stelar_client

The source code of the SDK is available at its GitHub repository.

STELAR Toolkit — Tools Index

Discovery

  • Synopsis Data Engine (SDE) — data stream summarization with persistent synopses.
    Lang: Java · Integration: In-cluster via client · Partners: ARC, TUE
    GitHub: stelar-eu/Synopses-Data-Engine

  • Correlation Detective — scalable multivariate correlation mining for vector datasets.
    Lang: Java · Integration: In-cluster · Partner: TUE
    GitHub: stelar-eu/correlation-detective · Docker: stelareu/correlation-detective

  • Forecasting Model Orchestrator (FOMO) — orchestrates/optimizes time-series forecasting models under a compute budget.
    Lang: Python 3.10 · Integration: In-cluster · Partner: TUE
    GitHub: stelar-eu/fomo · Docker: stelareu/fomo

  • TableSage — LLM-powered tabular profiling, summarization, and metadata enrichment.
    Lang: Python 3.10 · Integration: In-cluster · Partner: ARC
    GitHub: stelar-eu/TableSage-Docker · Docker: stelareu/tablesage

  • Data Profiler — automatic profiling for tabular, time-series, raster, text, hierarchical, and RDF data.
    Lang: Python 3.8 · Integration: In-cluster · Partner: ARC
    GitHub: stelar-eu/stelardataprofiler-docker · Docker: stelareu/data-profiler

Interlinking

  • pyJedAI Entity Matching (pyJedAI EM) — duplicate detection across datasets via multi-stage pipelines.
    Lang: Python 3.9 · Integration: In-cluster · Partner: UoA
    GitHub: stelar-eu/pyjedai-em · Docker: stelareu/pyjedai-em

  • pyJedAI Schema Matching (pyJedAI SM) — schema alignment for highly heterogeneous datasets.
    Lang: Python 3.9 · Integration: In-cluster · Partner: UoA
    GitHub: stelar-eu/pyjedai-sm · Docker: stelareu/pyjedai-sm

  • JedAI-spatial — interlinking for geospatial RDF; computes DE9IM topological relations.
    Lang: Java · Integration: In-cluster · Partner: UoA
    GitHub: stelar-eu/jedai-spatial · Docker: stelareu/jedai-spatial

  • Spatio-Temporal Time Series Extraction (TS-Extraction) — extracts per-pixel/field LAI statistics over time from satellite imagery.
    Lang: Python 3.10 · Integration: In-cluster · Partner: TUE
    GitHub: stelar-eu/spatiotemporal_timeseries_extraction · Docker: stelareu/ts-extract

  • Time Series Imputation (TS-Imputation) — SOTA imputation for time series (DL, statistical, and LLM-based methods).
    Lang: C#, Python 3.12 · Integration: In-cluster / Remote · Partner: ARC
    GitHub: stelar-eu/TS-Impute · Docker: stelareu/ts-impute

  • Missing Data Interpolation — fills gaps in daily weather data via inverse-distance weighted interpolation.
    Lang: Python 3.8 · Integration: In-cluster · Partner: ABACO
    GitHub: stelar-eu/missing-data-interpolation · Docker: stelareu/missing-data-interpolation

Annotation

  • Field Segmentation — automatic agricultural field boundary extraction from satellite imagery (RGB/NIR).
    Lang: Python 3.9 · Integration: In-cluster · Partner: TUE
    GitHub: stelar-eu/field_segmentation · Docker: stelareu/field-segmentation

  • AvengER — LLM ensembling/fine-tuning for entity resolution with configurable workflows and evaluation.
    Lang: Python 3.8 · Integration: In-cluster · Partner: ARC
    GitHub: stelar-eu/AvengER-Docker · Docker: stelareu/avenger

  • Generic NER — translation, summarization, NER, main-entity selection, and entity linking pipeline.
    Lang: Python 3.12 · Integration: In-cluster · Partner: ARC
    GitHub: stelar-eu/GenericNER · Docker: stelareu/generic-ner

  • Crop Classification — DL pipeline on LAI-derived time series for crop type & growth prediction.
    Lang: Python 3.11 · Integration: In-cluster / Remote · Partner: UniBwM
    GitHub: stelar-eu/crop_prediction_tool · Docker: stelareu/crop-prediction

  • Vocational Score Raster — generates rasterized vocational-skill score maps across regions.
    Lang: Python 3.8 · Integration: In-cluster · Partner: ABACO
    GitHub: stelar-eu/vocational-score-raster · Docker: stelareu/vocational-score-raster

  • Agri Products Match — matches fertilizers/pesticides to reference products via NPK/active substances; multilingual.
    Lang: Python 3.8 · Integration: In-cluster · Partner: ABACO
    GitHub: stelar-eu/agri-products-match · Docker: stelareu/agri-products-match

  • Hazard Classification — incident reporting for the agri-food domain.
    Lang: Python · Integration: None · Partner: UniBwM
    GitHub: stelar-eu/Hazard-classification · Docker: stelareu/hazard-classification

License

The contents of this project are licensed under the GPL-2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stelar_deploy-0.1.4.tar.gz (55.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stelar_deploy-0.1.4-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file stelar_deploy-0.1.4.tar.gz.

File metadata

  • Download URL: stelar_deploy-0.1.4.tar.gz
  • Upload date:
  • Size: 55.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stelar_deploy-0.1.4.tar.gz
Algorithm Hash digest
SHA256 6077eb0927204a45179d3ab9960ab56bdcde37f2993ea13ff56b71e6a08f5f1e
MD5 34b0dacf6437d9214d93d04c28370b36
BLAKE2b-256 070f6f8bf0c3a454189c80809eef22a8466209cf75c52b0b14894c8b7b23d5ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for stelar_deploy-0.1.4.tar.gz:

Publisher: publish-pypi.yml on stelar-eu/klms-deploy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file stelar_deploy-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: stelar_deploy-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for stelar_deploy-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e1abfa20801aa6a99db59b8968705e1e6e82d119b6a659bd973d66beac3edf64
MD5 8de9d1570c65762c87a620fd6bf04671
BLAKE2b-256 60fa85750f506fcc83e02ce4889c6c83571088d155a309b0de6ce5e2ffaad77e

See more details on using hashes here.

Provenance

The following attestation bundles were made for stelar_deploy-0.1.4-py3-none-any.whl:

Publisher: publish-pypi.yml on stelar-eu/klms-deploy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page