Skip to main content

CELINE utils

Project description

CELINE Utils

A collection of shared utilities, libraries, and command-line tools that form the technical backbone of the CELINE data platform. Provides reusable building blocks for data pipelines, governance, lineage, metadata management, and platform integrations.

Not an end-user application — a platform utility layer embedded into CELINE applications and executed within orchestrated environments using Meltano, dbt, Prefect, and OpenLineage.


Scope and goals

  • Centralise cross-cutting platform logic used by multiple CELINE projects
  • Provide opinionated but extensible tooling for data pipelines
  • Enforce consistent governance and lineage semantics
  • Reduce duplication across pipeline applications
  • Act as a stable foundation for CELINE-compatible services and workflows

Key capabilities

Governance framework

A declarative governance.yaml specification defines the metadata, access control, and dataspace exposure rules for each dataset.

The GovernanceRule model covers:

  • Dataset ownership (owner, attribution)
  • License and access level (open, internal, restricted, secret)
  • Data classification (pii, green, yellow, red) and retention
  • Tags, documentation links, and source system
  • user_filter_column — the column used for per-subject consent-based row filtering
  • expose: true — controls whether the dataset appears in the DCAT catalogue and is registered as an EDC asset

Extended blocks for DCAT-AP 3.0 and dataspace integration:

dcat: block — propagated to the DCAT-AP catalogue by dataset-api:

  • publisher_uri — overrides the API-level fallback publisher
  • themes — EU Publications Office data-theme URIs
  • language_uris — dct:language URIs
  • spatial_uris — dct:spatial URIs
  • accrual_periodicity — dct:accrualPeriodicity URI
  • conforms_to — dct:conformsTo URI
  • temporal.start / temporal.end — dct:temporal coverage

dataspace: block — consumed by export_governance.py when registering datasets in EDC:

  • contract_required — enables ds:contractRequired ODRL constraint
  • consent_required — enables ds:consentStatus ODRL constraint and consent-based row filtering
  • odrl_action — default ODRL action (default use)
  • purpose — ODRL purpose values
  • medallion — data quality level (gold / silver / bronze)

Governance rules are resolved with pattern matching via GovernanceResolver — defaults cascade from the defaults: block into each source entry, with per-source values taking precedence. The expose and dcat/dataspace fields use an OR-merge for booleans and override-merge for objects.

Both celine-utils (pipeline side) and dataset-api/cli/export_governance.py (catalogue side) parse the same governance.yaml format. EDC-specific sub-objects in the dataspace: block are silently ignored by celine-utils via model_config = ConfigDict(extra="ignore").

Pipeline orchestration

Structured execution layer for:

  • Meltano ingestion pipelines
  • dbt transformations and tests
  • Prefect-based Python flows

The PipelineRunner coordinates execution, logging, error handling, and lineage emission consistently across tools.

See the pipeline tutorial.

OpenLineage integration

  • Automatic emission of START, COMPLETE, FAIL, and ABORT events
  • Dataset-level schema facets
  • Data quality assertions from dbt tests
  • Custom CELINE governance facets (including userFilterColumn, medallion, classification)

Dataset tooling

The DatasetClient enables:

  • Schema and table introspection
  • Column metadata inspection
  • Safe query construction
  • Export to Pandas

Platform integrations

  • Keycloak for identity and access management
  • Apache Superset for analytics platform integration
  • MQTT for lightweight messaging

CLI

celine-utils governance generate   # generate governance.yaml template
celine-utils pipeline init         # scaffold a new pipeline
celine-utils pipeline run          # run a pipeline

Repository structure

celine/
  admin/
  cli/
  common/
  datasets/
  pipelines/
schemas/
tests/

Configuration

Environment-driven via pydantic-settings:

  • Environment variables first
  • Optional .env files
  • Typed validation with container-friendly defaults

Documentation

  • Pipeline Tutorial — end-to-end pipeline setup guide
  • Governance — governance.yaml format, access levels, pattern matching, dcat/dataspace blocks
  • Schemas — JSON Schema definitions including governance.schema.json
  • CLI — full CLI reference

Installation

pip install celine-utils

Intended audience

  • Data engineers
  • Platform engineers
  • CELINE application developers

License

Copyright © 2025 Spindox Labs

Licensed under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celine_utils-1.14.0.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celine_utils-1.14.0-py3-none-any.whl (67.4 kB view details)

Uploaded Python 3

File details

Details for the file celine_utils-1.14.0.tar.gz.

File metadata

  • Download URL: celine_utils-1.14.0.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.14.0.tar.gz
Algorithm Hash digest
SHA256 7bd55bd2a701688d63694744ab5575d7d6bf8213dba45913354963f3782e0747
MD5 5bc72517447c6eba0f086915876038a4
BLAKE2b-256 cb09e58316e6bc45993e20922a8a93133f594d3fb3a350d88e19e6c9b864adbf

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.14.0.tar.gz:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file celine_utils-1.14.0-py3-none-any.whl.

File metadata

  • Download URL: celine_utils-1.14.0-py3-none-any.whl
  • Upload date:
  • Size: 67.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c286135aaf4fa15c6a6fdeda70e88f2c66a77d4d4a9cda8c215ffe80791ab97c
MD5 f1449d2d44dc2a457074bcb85daaccd9
BLAKE2b-256 c7544c49201e38384fcecff7008221f2072fd885c54773359d1dafe67d505073

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.14.0-py3-none-any.whl:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page