Skip to main content

CELINE utils

Project description

CELINE Utils

A collection of shared utilities, libraries, and command-line tools that form the technical backbone of the CELINE data platform. Provides reusable building blocks for data pipelines, governance, lineage, metadata management, and platform integrations.

Not an end-user application — a platform utility layer embedded into CELINE applications and executed within orchestrated environments using Meltano, dbt, Prefect, and OpenLineage.


Scope and goals

  • Centralise cross-cutting platform logic used by multiple CELINE projects
  • Provide opinionated but extensible tooling for data pipelines
  • Enforce consistent governance and lineage semantics
  • Reduce duplication across pipeline applications
  • Act as a stable foundation for CELINE-compatible services and workflows

Key capabilities

Governance framework

A declarative governance.yaml specification defines the metadata, access control, and dataspace exposure rules for each dataset.

The GovernanceRule model covers:

  • Dataset ownership (owner, attribution)
  • License and access level (open, internal, restricted, secret)
  • Data classification (pii, green, yellow, red) and retention
  • Tags, documentation links, and source system
  • user_filter_column — the column used for per-subject consent-based row filtering
  • expose: true — controls whether the dataset appears in the DCAT catalogue and is registered as an EDC asset

Extended blocks for DCAT-AP 3.0 and dataspace integration:

dcat: block — propagated to the DCAT-AP catalogue by dataset-api:

  • publisher_uri — overrides the API-level fallback publisher
  • themes — EU Publications Office data-theme URIs
  • language_uris — dct:language URIs
  • spatial_uris — dct:spatial URIs
  • accrual_periodicity — dct:accrualPeriodicity URI
  • conforms_to — dct:conformsTo URI
  • temporal.start / temporal.end — dct:temporal coverage

dataspace: block — consumed by export_governance.py when registering datasets in EDC:

  • contract_required — enables ds:contractRequired ODRL constraint
  • consent_required — enables ds:consentStatus ODRL constraint and consent-based row filtering
  • odrl_action — default ODRL action (default use)
  • purpose — ODRL purpose values
  • medallion — data quality level (gold / silver / bronze)

Governance rules are resolved with pattern matching via GovernanceResolver — defaults cascade from the defaults: block into each source entry, with per-source values taking precedence. The expose and dcat/dataspace fields use an OR-merge for booleans and override-merge for objects.

Both celine-utils (pipeline side) and dataset-api/cli/export_governance.py (catalogue side) parse the same governance.yaml format. EDC-specific sub-objects in the dataspace: block are silently ignored by celine-utils via model_config = ConfigDict(extra="ignore").

Pipeline orchestration

Structured execution layer for:

  • Meltano ingestion pipelines
  • dbt transformations and tests
  • Prefect-based Python flows

The PipelineRunner coordinates execution, logging, error handling, and lineage emission consistently across tools.

See the pipeline tutorial.

OpenLineage integration

  • Automatic emission of START, COMPLETE, FAIL, and ABORT events
  • Dataset-level schema facets
  • Data quality assertions from dbt tests
  • Custom CELINE governance facets (including userFilterColumn, medallion, classification)

Dataset tooling

The DatasetClient enables:

  • Schema and table introspection
  • Column metadata inspection
  • Safe query construction
  • Export to Pandas

Platform integrations

  • Keycloak for identity and access management
  • Apache Superset for analytics platform integration
  • MQTT for lightweight messaging

CLI

celine-utils governance generate   # generate governance.yaml template
celine-utils pipeline init         # scaffold a new pipeline
celine-utils pipeline run          # run a pipeline

Repository structure

celine/
  admin/
  cli/
  common/
  datasets/
  pipelines/
schemas/
tests/

Configuration

Environment-driven via pydantic-settings:

  • Environment variables first
  • Optional .env files
  • Typed validation with container-friendly defaults

Documentation

  • Pipeline Tutorial — end-to-end pipeline setup guide
  • Governance — governance.yaml format, access levels, pattern matching, dcat/dataspace blocks
  • Schemas — JSON Schema definitions including governance.schema.json
  • CLI — full CLI reference

Installation

pip install celine-utils

Intended audience

  • Data engineers
  • Platform engineers
  • CELINE application developers

License

Copyright © 2025 Spindox Labs

Licensed under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celine_utils-1.14.2.tar.gz (52.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celine_utils-1.14.2-py3-none-any.whl (67.5 kB view details)

Uploaded Python 3

File details

Details for the file celine_utils-1.14.2.tar.gz.

File metadata

  • Download URL: celine_utils-1.14.2.tar.gz
  • Upload date:
  • Size: 52.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.14.2.tar.gz
Algorithm Hash digest
SHA256 2e95971ae1521553fbf7c258801aaaedd83717f7616d0e086395859edf03c7d1
MD5 6fcd5c2fbabb5a711ddbda5ac1be59a9
BLAKE2b-256 31d9ae49a0e5e4b2ded87dd139f1b256f64b6d66c75602dc4c4eaf895e6577ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.14.2.tar.gz:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file celine_utils-1.14.2-py3-none-any.whl.

File metadata

  • Download URL: celine_utils-1.14.2-py3-none-any.whl
  • Upload date:
  • Size: 67.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.14.2-py3-none-any.whl
Algorithm Hash digest
SHA256 68cddad90a2153ee20fcff8235a09644f947f4a619110fed4c6d3c41fd0f2d06
MD5 f3733d4095bf1af5bb7db23d665195f4
BLAKE2b-256 39117b774b2f1248633eb18a98a1a9ecc5692733b9afd02e304c5fa3aece6dbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.14.2-py3-none-any.whl:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page