Convert MLflow tracking data into MLSO-aligned RDF Knowledge Graphs
Project description
mlflow2rdf
A Python package to convert MLflow tracking data into MLSO-aligned RDF Knowledge Graphs.
Overview
mlflow2rdf transforms your MLflow experiment metadata (parameters, metrics, tags) into semantic RDF triples that conform to the MLSO (Machine Learning Sailor Ontology) standard. This enables:
- Semantic querying via SPARQL instead of imperative MLflow API loops
- Cross-platform interoperability with public ML knowledge graphs like MLSea
- FAIR principles for ML experiments: Findable, Accessible, Interoperable, Reusable
Installation
pip install mlflow2rdf
Quick Start
Command Line Interface
# Convert MLflow runs to RDF
mlflow2rdf --mlruns /path/to/mlruns --output output.ttl
# With SHACL validation
mlflow2rdf --mlruns /path/to/mlruns --output output.ttl --validate
Python API
from mlflow2rdf import MLflow2RDFConverter
# Initialize converter
converter = MLflow2RDFConverter(mlruns_path="/path/to/mlruns")
# Convert to RDF
graph = converter.convert()
# Serialize to Turtle format
graph.serialize("output.ttl", format="turtle")
# Validate with SHACL
results = converter.validate(graph)
print(f"SHACL violations: {results}")
Features
- Multi-modal support: Tabular, image, text, time-series, and multi-modal data
- Paradigm-aware routing: Automatic parameter routing based on learning paradigm
- Pipeline relationship inference: Detects knowledge distillation, LoRA adapters, self-supervised learning chains
- SHACL validation: Comprehensive shape validation against MLSO constraints
- Blind spot analysis: Completeness verification of metadata extraction
Supported Learning Paradigms
| Paradigm | Key Properties |
|---|---|
| Supervised Classification | Standard hyperparameters, accuracy metrics |
| Self-Supervised Learning | Pre-text/downstream run partitioning |
| Contrastive Learning | Temperature, distance metrics |
| Knowledge Distillation | Teacher-student relationships, distillation temperature |
| Parameter-Efficient Fine-tuning (LoRA) | Adapter rank, alpha, target modules |
| Multi-Modal Fusion | Fusion strategy, image/text encoders |
| Time-Series Forecasting | Forecasting horizon, lookback window |
Output Format
The package generates RDF triples in Turtle format, using MLSO/MLST vocabulary:
@prefix mls: <http://www.w3.org/ns/mls#> .
@prefix mlso: <http://w3id.org/mlso/> .
<run/abc123> a mls:Run ;
mls:hasInput <dataset/cifar10> ;
mls:hasOutput <evaluation/acc_0.95> ;
mlso:hasParadigm "Supervised Classification" .
Requirements
- Python >= 3.8
- MLflow >= 2.0.0
- RDFLib >= 6.0.0
- pySHACL >= 0.25.0
License
MIT License
Citation
If you use this package in your research, please cite:
@mastersthesis{jia2026mlflow2rdf,
author = {Jia, Sijie},
title = {Bridging ML Tracking and Semantic Interoperability: Transforming MLflow Experiment Metadata to MLSO-Aligned RDF Knowledge Graphs},
school = {KU Leuven},
year = {2026}
}
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlflow2rdf-0.1.0.tar.gz.
File metadata
- Download URL: mlflow2rdf-0.1.0.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf6e1e9a44bdfbd0d11904c9f17150aa59022d9130d7d52a89ffb65934c00d60
|
|
| MD5 |
590c487fa6783592aa5c43236a56a414
|
|
| BLAKE2b-256 |
7d860e86ffe333f5cc76ef62d1ae3af61a3348aa73337c1ad9799b5ce522403c
|
File details
Details for the file mlflow2rdf-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mlflow2rdf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fe90731af1de57b96a5b21cd46fda0080305695d01a97e085012906afdb61e1
|
|
| MD5 |
723825e00108891b1463c58b03081335
|
|
| BLAKE2b-256 |
37d51350426161ec8cb2053c90854a174569c1ca036d77e24c6ac156c85b925f
|