A library for loading datasets and models whose metadata is provided in the DCAT-AP format.
Project description
DCAT-AP Hub
dcat-ap-hub is a Python library for working with datasets and pretrained models described using DCAT-AP metadata.
It is built around a practical workflow that resolves metadata, downloads artifacts, and loads datasets or models through a single interface.
Currently, metadata parsing supports JSON-LD from direct URLs, content negotiation, and local files.
Typical Workflow
-
Retrieve dataset metadata in DCAT-AP from:
- remote JSON-LD URLs (
Dataset.from_url(...)) - local metadata files (
Dataset.from_file(...)) - local directories that contain metadata files (
Dataset.from_directory(...))
- remote JSON-LD URLs (
-
Download files referenced by distributions and related resources (
dcat:downloadURL) into a local dataset directory. -
Load files or models for use in code:
- Load files as a lazy
FileCollectionwith built-in loaders for common formats such as CSV, Excel, JSON, Parquet, images, PDF, text, HTML/XML, and NumPy arrays. - Load pretrained models through Hugging Face, ONNX, or sklearn-style model scripts.
- Load files as a lazy
Benchmarking With Catalogues
Optionally, related resources can be used to attach a processor script that is detected automatically and applied to transform raw files. This enables the definition of multi-dataset benchmarks as DCAT-AP catalogues, since benchmarking requires each dataset to provide a fixed train-test split, which can be generated through these processor scripts.
Requirements for Metadata
- Each dataset metadata record must include a
dcat:Datasetentry. - Entries with
@typeset tomls:Modelare treated as models. - Roles for distributions (
dcat:Distribution) and related resources (rdfs:Resource) can be defined throughdct:conformsToand/ordct:format, allowing the specification of model types or processors. - The
dcat:downloadURLfield identifies the files to be downloaded.
How To Install
# Base install (datasets, processing)
pip install dcat-ap-hub
# Install with ONNX model loading support
pip install "dcat-ap-hub[onnx]"
# Install with Hugging Face model loading support
pip install "dcat-ap-hub[huggingface]"
Example of Loading a Dataset
from dcat_ap_hub import Dataset
url = "https://ki-daten.hlrs.de/de/dataset/https-piveau-io-set-data-predictive-maintenance-ttl"
ds = Dataset.from_url(url)
files = ds.download(data_dir="./data")
Example of Loading a Huggingface Model
from dcat_ap_hub import Dataset
url = "https://ki-daten.hlrs.de/de/model/prajjwal1-bert-tiny"
ds = Dataset.from_url(url)
files = ds.download(data_dir="./data")
model, processor, metadata = ds.load_model(model_dir="./models")
Example of Loading a SKLearn Model
from dcat_ap_hub import Dataset
url = "https://ki-daten.hlrs.de/de/model/https-piveau-io-set-data-pre-trained-transformer"
ds = Dataset.from_url(url)
files = ds.download(data_dir="./data")
model = ds.load_model(model_dir="./models")
Example of Processing a Dataset if Available
from dcat_ap_hub import Dataset
url = "https://ki-daten.hlrs.de/de/dataset/https-piveau-io-set-data-predictive-maintenance-ttl"
ds = Dataset.from_url(url)
files = ds.download(data_dir="./data")
processed = ds.process(processed_dir="./processed")
Funding
This project was developed using resources from the HammerHAI project, an EU co-funded AI Factory initiative operated by the High-Performance Computing Center Stuttgart and supported by the European Commission as well as German federal and state ministries. It is funded by the European High Performance Computing Joint Undertaking under Grant Agreement No. 101234027.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dcat_ap_hub-0.1.4.tar.gz.
File metadata
- Download URL: dcat_ap_hub-0.1.4.tar.gz
- Upload date:
- Size: 135.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f79d74985e6a05834c0e301c9334cd9b580d5885842c14b9eebbd2d57efff79a
|
|
| MD5 |
724b78bee24689eb7df9b68e01858e7f
|
|
| BLAKE2b-256 |
ed8cc0143951e25f1ecc155d93b1bb57b7c767465687f356e99edf3064fd90c4
|
File details
Details for the file dcat_ap_hub-0.1.4-py3-none-any.whl.
File metadata
- Download URL: dcat_ap_hub-0.1.4-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a0992e2b2eaebcbb347b6f3c606b4ec15c05918f0d4ccbe3fb1437ce2ff7511
|
|
| MD5 |
8462e3230342a49ed23dcde52602e7dc
|
|
| BLAKE2b-256 |
c5c231d797c5e5be7522f80cf0e94c625c62639ce9a1d7a7c43b77a5b5f28a4b
|