STELAR KLMS deployment tooling
Project description
Overview
This repository provides instructions, documentation, and examples regarding deployment of the Knowledge Lake Management System (KLMS) developed by the STELAR project. The STELAR KLMS supports and facilitates a holistic approach for FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready (high-quality, reliably labeled) data. It allows to (semi-)automatically turn a raw data lake into a knowledge lake by: (a) enhancing the data lake with a knowledge layer; and (b) developing and integrating a set of data management tools and workflows. The knowledge layer comprises: (a) a data catalog that offers automatically enhanced metadata for the raw data assets in the lake; and (b) a knowledge graph that semantically describes and interlinks these data assets using suitable domain ontologies and vocabularies. The provided STELAR tools and workflows offer novel functionalities for: (a) data discovery and quality management; (b) data linking and alignment, and (c) data annotation and synthetic data generation.
Deployment CLI
This repository includes stelarctl, the operator CLI for preparing a STELAR
KLMS lake deployment before running Tanka. It creates the workspace layout,
creates Tanka environment directories, validates product specs, generates
<productName>_fullspec.json, records the active fullspec at
spec.stelar.active_product, writes spec.json, runs cluster preflight checks,
and creates missing deployment Secrets. lake add copies the managed
main.jsonnet entrypoint from the vendored deployment library.
The intended operator install path is:
pipx install stelar-deploy
stelarctl --help
A minimal deployment flow is:
stelarctl workspace init ./lake-workspace
cd ./lake-workspace
jb install
stelarctl lake add dev --context my-kube-context --namespace stelar-dev
stelarctl lake create --minimal minimal dev --namespace stelar-dev
stelarctl lake verify dev --context my-kube-context --namespace stelar-dev
stelarctl lake bootstrap dev
tk apply dev
stelarctl lake status dev
The first created product is activated automatically. Use
stelarctl lake activate PRODUCT_NAME ENV only when switching to a different
generated product or after regenerating an existing active product.
Cleanup is split by ownership: use tk delete dev for Tanka-rendered
resources, and stelarctl lake purge-secrets dev when you also want to delete
the bootstrap Secrets created by stelarctl.
See docs/stelarctl.md for the full command reference, TLS modes, generated files, preflight behavior, cleanup behavior, and troubleshooting notes.
KLMS core components
-
STELAR API. The main entry point to the KLMS system, exposing RESTful endpoints for managing and searching resources in the KLMS. Houses the core services of the KLMS, including user management, dataset management, metadata extraction, and search functionalities, task and workflow invocation. Exposes a GUI for interacting with the KLMS system, the STELAR KLMS Console, supporting the full spectrum of KLMS functionalities.
-
Data Catalog of datasets in KLMS, deployed as a CKAN instance, mainly utilized under the hood.
-
Keycloak is used for Identity and Access Management;
-
PostgreSQL serves as the main relational database backbone for storing KLMS metadata and user information.
-
Ontop a knowledge graph, employing mappings from the database to a virtual RDF graph according to the KLMS ontology.
-
QUAY Registry, via a custom distribution for managing STELAR Data Analysis Tools container images.
-
MinIO serves as a storage layer for the data assets tracked by the Data Catalog as well as for tool images.
-
Redis is used as an in-memory data structure store for caching.
-
LLM-powered Semantic Dataset Search Facility is a tool for enhancing dataset search capabilities using large language models. It is integrated into the KLMS Console and implemented as a FastAPI service under the hood.
-
STELAR Resource Previewer is a streamlit-based tool for visualizing and exploring the resources of the data catalog artifacts. It is exposed via the central ingress controller of the KLMS deployment and embedded in the KLMS Console.
-
STELAR Profile Visualizer is a tool for visualizing and exploring the profiles of the data catalog artifacts. It is also exposed via the central ingress controller of the KLMS deployment and embedded in the KLMS Console.
The STELAR KLMS supports two alternative workflow engines:
-
In its Community Edition, it supports Apache Airflow, which is a very popular open-source platform for this purpose.
-
In its Professional and Enterprise editions, it supports the RapidMiner Studio & AI Hub, which is a widely used commercial platform for machine learning and data science workflows.
While both options have been well-tested in regards with their compatibility, the range of open-source tools and systems STELAR can integrate with is limitless. Integration can be achieved by the STELAR API directly or indirectly through the STELAR Python SDK.
Access to the STELAR API is provided either directly via its RESTful endpoints or via the STELAR Python SDK, a client library for interacting with the STELAR API. The SDK is available via PyPI and can be installed via pip:
pip install stelar_client
The source code of the SDK is available at its GitHub repository.
STELAR Toolkit — Tools Index
Discovery
-
Synopsis Data Engine (SDE) — data stream summarization with persistent synopses.
Lang: Java · Integration: In-cluster via client · Partners: ARC, TUE
GitHub:stelar-eu/Synopses-Data-Engine -
Correlation Detective — scalable multivariate correlation mining for vector datasets.
Lang: Java · Integration: In-cluster · Partner: TUE
GitHub:stelar-eu/correlation-detective· Docker:stelareu/correlation-detective -
Forecasting Model Orchestrator (FOMO) — orchestrates/optimizes time-series forecasting models under a compute budget.
Lang: Python 3.10 · Integration: In-cluster · Partner: TUE
GitHub:stelar-eu/fomo· Docker:stelareu/fomo -
TableSage — LLM-powered tabular profiling, summarization, and metadata enrichment.
Lang: Python 3.10 · Integration: In-cluster · Partner: ARC
GitHub:stelar-eu/TableSage-Docker· Docker:stelareu/tablesage -
Data Profiler — automatic profiling for tabular, time-series, raster, text, hierarchical, and RDF data.
Lang: Python 3.8 · Integration: In-cluster · Partner: ARC
GitHub:stelar-eu/stelardataprofiler-docker· Docker:stelareu/data-profiler
Interlinking
-
pyJedAI Entity Matching (pyJedAI EM) — duplicate detection across datasets via multi-stage pipelines.
Lang: Python 3.9 · Integration: In-cluster · Partner: UoA
GitHub:stelar-eu/pyjedai-em· Docker:stelareu/pyjedai-em -
pyJedAI Schema Matching (pyJedAI SM) — schema alignment for highly heterogeneous datasets.
Lang: Python 3.9 · Integration: In-cluster · Partner: UoA
GitHub:stelar-eu/pyjedai-sm· Docker:stelareu/pyjedai-sm -
JedAI-spatial — interlinking for geospatial RDF; computes DE9IM topological relations.
Lang: Java · Integration: In-cluster · Partner: UoA
GitHub:stelar-eu/jedai-spatial· Docker:stelareu/jedai-spatial -
Spatio-Temporal Time Series Extraction (TS-Extraction) — extracts per-pixel/field LAI statistics over time from satellite imagery.
Lang: Python 3.10 · Integration: In-cluster · Partner: TUE
GitHub:stelar-eu/spatiotemporal_timeseries_extraction· Docker:stelareu/ts-extract -
Time Series Imputation (TS-Imputation) — SOTA imputation for time series (DL, statistical, and LLM-based methods).
Lang: C#, Python 3.12 · Integration: In-cluster / Remote · Partner: ARC
GitHub:stelar-eu/TS-Impute· Docker:stelareu/ts-impute -
Missing Data Interpolation — fills gaps in daily weather data via inverse-distance weighted interpolation.
Lang: Python 3.8 · Integration: In-cluster · Partner: ABACO
GitHub:stelar-eu/missing-data-interpolation· Docker:stelareu/missing-data-interpolation
Annotation
-
Field Segmentation — automatic agricultural field boundary extraction from satellite imagery (RGB/NIR).
Lang: Python 3.9 · Integration: In-cluster · Partner: TUE
GitHub:stelar-eu/field_segmentation· Docker:stelareu/field-segmentation -
AvengER — LLM ensembling/fine-tuning for entity resolution with configurable workflows and evaluation.
Lang: Python 3.8 · Integration: In-cluster · Partner: ARC
GitHub:stelar-eu/AvengER-Docker· Docker:stelareu/avenger -
Generic NER — translation, summarization, NER, main-entity selection, and entity linking pipeline.
Lang: Python 3.12 · Integration: In-cluster · Partner: ARC
GitHub:stelar-eu/GenericNER· Docker:stelareu/generic-ner -
Crop Classification — DL pipeline on LAI-derived time series for crop type & growth prediction.
Lang: Python 3.11 · Integration: In-cluster / Remote · Partner: UniBwM
GitHub:stelar-eu/crop_prediction_tool· Docker:stelareu/crop-prediction -
Vocational Score Raster — generates rasterized vocational-skill score maps across regions.
Lang: Python 3.8 · Integration: In-cluster · Partner: ABACO
GitHub:stelar-eu/vocational-score-raster· Docker:stelareu/vocational-score-raster -
Agri Products Match — matches fertilizers/pesticides to reference products via NPK/active substances; multilingual.
Lang: Python 3.8 · Integration: In-cluster · Partner: ABACO
GitHub:stelar-eu/agri-products-match· Docker:stelareu/agri-products-match -
Hazard Classification — incident reporting for the agri-food domain.
Lang: Python · Integration: None · Partner: UniBwM
GitHub:stelar-eu/Hazard-classification· Docker:stelareu/hazard-classification
License
The contents of this project are licensed under the GPL-2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stelar_deploy-0.1.13.tar.gz.
File metadata
- Download URL: stelar_deploy-0.1.13.tar.gz
- Upload date:
- Size: 85.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f7ed88b11e3a420d228b2733ddb30fd3217ca7101e70d8ff5220d6c6e91d50e
|
|
| MD5 |
e863f3be92096a6e04d0872e89934bb2
|
|
| BLAKE2b-256 |
dd7e3fdfb4304db972a11e35953392f6a11eba3c9fc75aa49c4ea956f81669ce
|
Provenance
The following attestation bundles were made for stelar_deploy-0.1.13.tar.gz:
Publisher:
publish-pypi.yml on stelar-eu/klms-deploy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stelar_deploy-0.1.13.tar.gz -
Subject digest:
8f7ed88b11e3a420d228b2733ddb30fd3217ca7101e70d8ff5220d6c6e91d50e - Sigstore transparency entry: 1765470504
- Sigstore integration time:
-
Permalink:
stelar-eu/klms-deploy@1a34ec7c1e439afdb32db9155021ad15cd803363 -
Branch / Tag:
refs/tags/stelarctl-v0.1.13 - Owner: https://github.com/stelar-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@1a34ec7c1e439afdb32db9155021ad15cd803363 -
Trigger Event:
push
-
Statement type:
File details
Details for the file stelar_deploy-0.1.13-py3-none-any.whl.
File metadata
- Download URL: stelar_deploy-0.1.13-py3-none-any.whl
- Upload date:
- Size: 98.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1ba57fe3b07bb1b78fae520bfe77f7e3b7ef6b8cd5f9c0f421b85f00f9dcfa9
|
|
| MD5 |
ac9eaee305332300667976954728fe9f
|
|
| BLAKE2b-256 |
63be47ef383c1edbe6bd428cf36a1184512649b7eb07db8c712523741aefb50b
|
Provenance
The following attestation bundles were made for stelar_deploy-0.1.13-py3-none-any.whl:
Publisher:
publish-pypi.yml on stelar-eu/klms-deploy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stelar_deploy-0.1.13-py3-none-any.whl -
Subject digest:
f1ba57fe3b07bb1b78fae520bfe77f7e3b7ef6b8cd5f9c0f421b85f00f9dcfa9 - Sigstore transparency entry: 1765471031
- Sigstore integration time:
-
Permalink:
stelar-eu/klms-deploy@1a34ec7c1e439afdb32db9155021ad15cd803363 -
Branch / Tag:
refs/tags/stelarctl-v0.1.13 - Owner: https://github.com/stelar-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@1a34ec7c1e439afdb32db9155021ad15cd803363 -
Trigger Event:
push
-
Statement type: