Skip to main content

CELINE utils

Project description

CELINE Utils

A collection of shared utilities, libraries, and command-line tools that form the technical backbone of the CELINE data platform. Provides reusable building blocks for data pipelines, governance, lineage, metadata management, and platform integrations.

Not an end-user application — a platform utility layer embedded into CELINE applications and executed within orchestrated environments using Meltano, dbt, Prefect, and OpenLineage.


Scope and goals

  • Centralise cross-cutting platform logic used by multiple CELINE projects
  • Provide opinionated but extensible tooling for data pipelines
  • Enforce consistent governance and lineage semantics
  • Reduce duplication across pipeline applications
  • Act as a stable foundation for CELINE-compatible services and workflows

Key capabilities

Governance framework

A declarative governance.yaml specification defines the metadata, access control, and dataspace exposure rules for each dataset.

The GovernanceRule model covers:

  • Dataset ownership (owner, attribution)
  • License and access level (open, internal, restricted, secret)
  • Data classification (pii, green, yellow, red) and retention
  • Tags, documentation links, and source system
  • user_filter_column — the column used for per-subject consent-based row filtering
  • expose: true — controls whether the dataset appears in the DCAT catalogue and is registered as an EDC asset

Extended blocks for DCAT-AP 3.0 and dataspace integration:

dcat: block — propagated to the DCAT-AP catalogue by dataset-api:

  • publisher_uri — overrides the API-level fallback publisher
  • themes — EU Publications Office data-theme URIs
  • language_uris — dct:language URIs
  • spatial_uris — dct:spatial URIs
  • accrual_periodicity — dct:accrualPeriodicity URI
  • conforms_to — dct:conformsTo URI
  • temporal.start / temporal.end — dct:temporal coverage

dataspace: block — consumed by export_governance.py when registering datasets in EDC:

  • contract_required — enables ds:contractRequired ODRL constraint
  • consent_required — enables ds:consentStatus ODRL constraint and consent-based row filtering
  • odrl_action — default ODRL action (default use)
  • purpose — ODRL purpose values
  • medallion — data quality level (gold / silver / bronze)

Governance rules are resolved with pattern matching via GovernanceResolver — defaults cascade from the defaults: block into each source entry, with per-source values taking precedence. The expose and dcat/dataspace fields use an OR-merge for booleans and override-merge for objects.

Both celine-utils (pipeline side) and dataset-api/cli/export_governance.py (catalogue side) parse the same governance.yaml format. EDC-specific sub-objects in the dataspace: block are silently ignored by celine-utils via model_config = ConfigDict(extra="ignore").

Pipeline orchestration

Structured execution layer for:

  • Meltano ingestion pipelines
  • dbt transformations and tests
  • Prefect-based Python flows

The PipelineRunner coordinates execution, logging, error handling, and lineage emission consistently across tools.

See the pipeline tutorial.

OpenLineage integration

  • Automatic emission of START, COMPLETE, FAIL, and ABORT events
  • Dataset-level schema facets
  • Data quality assertions from dbt tests
  • Custom CELINE governance facets (including userFilterColumn, medallion, classification)

Dataset tooling

The DatasetClient enables:

  • Schema and table introspection
  • Column metadata inspection
  • Safe query construction
  • Export to Pandas

Platform integrations

  • Keycloak for identity and access management
  • Apache Superset for analytics platform integration
  • MQTT for lightweight messaging

CLI

celine-utils governance generate   # generate governance.yaml template
celine-utils pipeline init         # scaffold a new pipeline
celine-utils pipeline run          # run a pipeline

Repository structure

celine/
  admin/
  cli/
  common/
  datasets/
  pipelines/
schemas/
tests/

Configuration

Environment-driven via pydantic-settings:

  • Environment variables first
  • Optional .env files
  • Typed validation with container-friendly defaults

Documentation

  • Pipeline Tutorial — end-to-end pipeline setup guide
  • Governance — governance.yaml format, access levels, pattern matching, dcat/dataspace blocks
  • Schemas — JSON Schema definitions including governance.schema.json
  • CLI — full CLI reference

Installation

pip install celine-utils

Intended audience

  • Data engineers
  • Platform engineers
  • CELINE application developers

License

Copyright © 2025 Spindox Labs

Licensed under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celine_utils-1.11.0.tar.gz (52.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celine_utils-1.11.0-py3-none-any.whl (67.6 kB view details)

Uploaded Python 3

File details

Details for the file celine_utils-1.11.0.tar.gz.

File metadata

  • Download URL: celine_utils-1.11.0.tar.gz
  • Upload date:
  • Size: 52.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.11.0.tar.gz
Algorithm Hash digest
SHA256 7b26bf69d46f051d9576c8d0b425e773dadc72c7bdc2b179224f632521c304f6
MD5 e1991bede5913dbc2cdd59f47cbba5aa
BLAKE2b-256 1f0bb80681ed734d2c6f0d0bcdc35e3465b094a9baa0cacf878223b6cbeeed55

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.11.0.tar.gz:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file celine_utils-1.11.0-py3-none-any.whl.

File metadata

  • Download URL: celine_utils-1.11.0-py3-none-any.whl
  • Upload date:
  • Size: 67.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8abce61c3691bf17217e87cd02a16cee1beb7c9b484640b126ceb50b43d873a
MD5 d8ff2ce4730ec95a98cf3023850bdba4
BLAKE2b-256 5e99d5d7caa58b4096e390b69dafa1696197a16a76c628a2d03387ac79172b0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.11.0-py3-none-any.whl:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page