Skip to main content

CELINE utils

Project description

CELINE Utils

A collection of shared utilities, libraries, and command-line tools that form the technical backbone of the CELINE data platform. Provides reusable building blocks for data pipelines, governance, lineage, metadata management, and platform integrations.

Not an end-user application — a platform utility layer embedded into CELINE applications and executed within orchestrated environments using Meltano, dbt, Prefect, and OpenLineage.


Scope and goals

  • Centralise cross-cutting platform logic used by multiple CELINE projects
  • Provide opinionated but extensible tooling for data pipelines
  • Enforce consistent governance and lineage semantics
  • Reduce duplication across pipeline applications
  • Act as a stable foundation for CELINE-compatible services and workflows

Key capabilities

Governance framework

A declarative governance.yaml specification defines the metadata, access control, and dataspace exposure rules for each dataset.

The GovernanceRule model covers:

  • Dataset ownership (owner, attribution)
  • License and access level (open, internal, restricted, secret)
  • Data classification (pii, green, yellow, red) and retention
  • Tags, documentation links, and source system
  • user_filter_column — the column used for per-subject consent-based row filtering
  • expose: true — controls whether the dataset appears in the DCAT catalogue and is registered as an EDC asset

Extended blocks for DCAT-AP 3.0 and dataspace integration:

dcat: block — propagated to the DCAT-AP catalogue by dataset-api:

  • publisher_uri — overrides the API-level fallback publisher
  • themes — EU Publications Office data-theme URIs
  • language_uris — dct:language URIs
  • spatial_uris — dct:spatial URIs
  • accrual_periodicity — dct:accrualPeriodicity URI
  • conforms_to — dct:conformsTo URI
  • temporal.start / temporal.end — dct:temporal coverage

dataspace: block — consumed by export_governance.py when registering datasets in EDC:

  • contract_required — enables ds:contractRequired ODRL constraint
  • consent_required — enables ds:consentStatus ODRL constraint and consent-based row filtering
  • odrl_action — default ODRL action (default use)
  • purpose — ODRL purpose values
  • medallion — data quality level (gold / silver / bronze)

Governance rules are resolved with pattern matching via GovernanceResolver — defaults cascade from the defaults: block into each source entry, with per-source values taking precedence. The expose and dcat/dataspace fields use an OR-merge for booleans and override-merge for objects.

Both celine-utils (pipeline side) and dataset-api/cli/export_governance.py (catalogue side) parse the same governance.yaml format. EDC-specific sub-objects in the dataspace: block are silently ignored by celine-utils via model_config = ConfigDict(extra="ignore").

Pipeline orchestration

Structured execution layer for:

  • Meltano ingestion pipelines
  • dbt transformations and tests
  • Prefect-based Python flows

The PipelineRunner coordinates execution, logging, error handling, and lineage emission consistently across tools.

See the pipeline tutorial.

OpenLineage integration

  • Automatic emission of START, COMPLETE, FAIL, and ABORT events
  • Dataset-level schema facets
  • Data quality assertions from dbt tests
  • Custom CELINE governance facets (including userFilterColumn, medallion, classification)

Dataset tooling

The DatasetClient enables:

  • Schema and table introspection
  • Column metadata inspection
  • Safe query construction
  • Export to Pandas

Platform integrations

  • Keycloak for identity and access management
  • Apache Superset for analytics platform integration
  • MQTT for lightweight messaging

CLI

celine-utils governance generate   # generate governance.yaml template
celine-utils pipeline init         # scaffold a new pipeline
celine-utils pipeline run          # run a pipeline

Repository structure

celine/
  admin/
  cli/
  common/
  datasets/
  pipelines/
schemas/
tests/

Configuration

Environment-driven via pydantic-settings:

  • Environment variables first
  • Optional .env files
  • Typed validation with container-friendly defaults

Documentation

  • Pipeline Tutorial — end-to-end pipeline setup guide
  • Governance — governance.yaml format, access levels, pattern matching, dcat/dataspace blocks
  • Schemas — JSON Schema definitions including governance.schema.json
  • CLI — full CLI reference

Installation

pip install celine-utils

Intended audience

  • Data engineers
  • Platform engineers
  • CELINE application developers

License

Copyright © 2025 Spindox Labs

Licensed under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celine_utils-1.16.0.tar.gz (53.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celine_utils-1.16.0-py3-none-any.whl (67.7 kB view details)

Uploaded Python 3

File details

Details for the file celine_utils-1.16.0.tar.gz.

File metadata

  • Download URL: celine_utils-1.16.0.tar.gz
  • Upload date:
  • Size: 53.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.16.0.tar.gz
Algorithm Hash digest
SHA256 09c2512e074b435a708e1eb7349db118142bf09b046ac7deb25e6ac09e47a022
MD5 75f51e4c3d667f121451a7e126ce6726
BLAKE2b-256 ba5d4c38d22c487fe74c9382b284e8e9b20bff00695d08aee9a8823abf3e411b

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.16.0.tar.gz:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file celine_utils-1.16.0-py3-none-any.whl.

File metadata

  • Download URL: celine_utils-1.16.0-py3-none-any.whl
  • Upload date:
  • Size: 67.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f969748f5147ef7abd3c3c55a126cdbbd3bbe3a844ced46b055717aa4fe850f8
MD5 c0076a0003c5bd4a64b342c7add38eab
BLAKE2b-256 d69d947c54acb6c824cfdcb61263da963ec905e51a4584cd1e9decca43b6802d

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.16.0-py3-none-any.whl:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page