Skip to main content

Convert MLflow tracking data into MLSO-aligned RDF Knowledge Graphs

Project description

mlflow2rdf

A Python package to convert MLflow tracking data into MLSO-aligned RDF Knowledge Graphs.

Overview

mlflow2rdf transforms your MLflow experiment metadata (parameters, metrics, tags) into semantic RDF triples that conform to the MLSO (Machine Learning Sailor Ontology) standard. This enables:

  • Semantic querying via SPARQL instead of imperative MLflow API loops
  • Cross-platform interoperability with public ML knowledge graphs like MLSea
  • FAIR principles for ML experiments: Findable, Accessible, Interoperable, Reusable

Installation

pip install mlflow2rdf

Quick Start

Command Line Interface

# Convert MLflow runs to RDF
mlflow2rdf --mlruns /path/to/mlruns --output output.ttl

# With SHACL validation
mlflow2rdf --mlruns /path/to/mlruns --output output.ttl --validate

Python API

from mlflow2rdf import MLflow2RDFConverter

# Initialize converter
converter = MLflow2RDFConverter(mlruns_path="/path/to/mlruns")

# Convert to RDF
graph = converter.convert()

# Serialize to Turtle format
graph.serialize("output.ttl", format="turtle")

# Validate with SHACL
results = converter.validate(graph)
print(f"SHACL violations: {results}")

Features

  • Multi-modal support: Tabular, image, text, time-series, and multi-modal data
  • Paradigm-aware routing: Automatic parameter routing based on learning paradigm
  • Pipeline relationship inference: Detects knowledge distillation, LoRA adapters, self-supervised learning chains
  • SHACL validation: Comprehensive shape validation against MLSO constraints
  • Blind spot analysis: Completeness verification of metadata extraction

Supported Learning Paradigms

Paradigm Key Properties
Supervised Classification Standard hyperparameters, accuracy metrics
Self-Supervised Learning Pre-text/downstream run partitioning
Contrastive Learning Temperature, distance metrics
Knowledge Distillation Teacher-student relationships, distillation temperature
Parameter-Efficient Fine-tuning (LoRA) Adapter rank, alpha, target modules
Multi-Modal Fusion Fusion strategy, image/text encoders
Time-Series Forecasting Forecasting horizon, lookback window

Output Format

The package generates RDF triples in Turtle format, using MLSO/MLST vocabulary:

@prefix mls: <http://www.w3.org/ns/mls#> .
@prefix mlso: <http://w3id.org/mlso/> .

<run/abc123> a mls:Run ;
    mls:hasInput <dataset/cifar10> ;
    mls:hasOutput <evaluation/acc_0.95> ;
    mlso:hasParadigm "Supervised Classification" .

Requirements

  • Python >= 3.8
  • MLflow >= 2.0.0
  • RDFLib >= 6.0.0
  • pySHACL >= 0.25.0

License

MIT License

Citation

If you use this package in your research, please cite:

@mastersthesis{jia2026mlflow2rdf,
  author = {Jia, Sijie},
  title = {Bridging ML Tracking and Semantic Interoperability: Transforming MLflow Experiment Metadata to MLSO-Aligned RDF Knowledge Graphs},
  school = {KU Leuven},
  year = {2026}
}

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow2rdf-0.1.0.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow2rdf-0.1.0-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file mlflow2rdf-0.1.0.tar.gz.

File metadata

  • Download URL: mlflow2rdf-0.1.0.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mlflow2rdf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cf6e1e9a44bdfbd0d11904c9f17150aa59022d9130d7d52a89ffb65934c00d60
MD5 590c487fa6783592aa5c43236a56a414
BLAKE2b-256 7d860e86ffe333f5cc76ef62d1ae3af61a3348aa73337c1ad9799b5ce522403c

See more details on using hashes here.

File details

Details for the file mlflow2rdf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlflow2rdf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for mlflow2rdf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1fe90731af1de57b96a5b21cd46fda0080305695d01a97e085012906afdb61e1
MD5 723825e00108891b1463c58b03081335
BLAKE2b-256 37d51350426161ec8cb2053c90854a174569c1ca036d77e24c6ac156c85b925f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page