Skip to main content

CELINE utils

Project description

CELINE Utils

A collection of shared utilities, libraries, and command-line tools that form the technical backbone of the CELINE data platform. Provides reusable building blocks for data pipelines, governance, lineage, metadata management, and platform integrations.

Not an end-user application — a platform utility layer embedded into CELINE applications and executed within orchestrated environments using Meltano, dbt, Prefect, and OpenLineage.


Scope and goals

  • Centralise cross-cutting platform logic used by multiple CELINE projects
  • Provide opinionated but extensible tooling for data pipelines
  • Enforce consistent governance and lineage semantics
  • Reduce duplication across pipeline applications
  • Act as a stable foundation for CELINE-compatible services and workflows

Key capabilities

Governance framework

A declarative governance.yaml specification defines the metadata, access control, and dataspace exposure rules for each dataset.

The GovernanceRule model covers:

  • Dataset ownership (owner, attribution)
  • License and access level (open, internal, restricted, secret)
  • Data classification (pii, green, yellow, red) and retention
  • Tags, documentation links, and source system
  • user_filter_column — the column used for per-subject consent-based row filtering
  • expose: true — controls whether the dataset appears in the DCAT catalogue and is registered as an EDC asset

Extended blocks for DCAT-AP 3.0 and dataspace integration:

dcat: block — propagated to the DCAT-AP catalogue by dataset-api:

  • publisher_uri — overrides the API-level fallback publisher
  • themes — EU Publications Office data-theme URIs
  • language_uris — dct:language URIs
  • spatial_uris — dct:spatial URIs
  • accrual_periodicity — dct:accrualPeriodicity URI
  • conforms_to — dct:conformsTo URI
  • temporal.start / temporal.end — dct:temporal coverage

dataspace: block — consumed by export_governance.py when registering datasets in EDC:

  • contract_required — enables ds:contractRequired ODRL constraint
  • consent_required — enables ds:consentStatus ODRL constraint and consent-based row filtering
  • odrl_action — default ODRL action (default use)
  • purpose — ODRL purpose values
  • medallion — data quality level (gold / silver / bronze)

Governance rules are resolved with pattern matching via GovernanceResolver — defaults cascade from the defaults: block into each source entry, with per-source values taking precedence. The expose and dcat/dataspace fields use an OR-merge for booleans and override-merge for objects.

Both celine-utils (pipeline side) and dataset-api/cli/export_governance.py (catalogue side) parse the same governance.yaml format. EDC-specific sub-objects in the dataspace: block are silently ignored by celine-utils via model_config = ConfigDict(extra="ignore").

Pipeline orchestration

Structured execution layer for:

  • Meltano ingestion pipelines
  • dbt transformations and tests
  • Prefect-based Python flows

The PipelineRunner coordinates execution, logging, error handling, and lineage emission consistently across tools.

See the pipeline tutorial.

OpenLineage integration

  • Automatic emission of START, COMPLETE, FAIL, and ABORT events
  • Dataset-level schema facets
  • Data quality assertions from dbt tests
  • Custom CELINE governance facets (including userFilterColumn, medallion, classification)

Dataset tooling

The DatasetClient enables:

  • Schema and table introspection
  • Column metadata inspection
  • Safe query construction
  • Export to Pandas

Platform integrations

  • Keycloak for identity and access management
  • Apache Superset for analytics platform integration
  • MQTT for lightweight messaging

CLI

celine-utils governance generate   # generate governance.yaml template
celine-utils pipeline init         # scaffold a new pipeline
celine-utils pipeline run          # run a pipeline

Repository structure

celine/
  admin/
  cli/
  common/
  datasets/
  pipelines/
schemas/
tests/

Configuration

Environment-driven via pydantic-settings:

  • Environment variables first
  • Optional .env files
  • Typed validation with container-friendly defaults

Documentation

  • Pipeline Tutorial — end-to-end pipeline setup guide
  • Governance — governance.yaml format, access levels, pattern matching, dcat/dataspace blocks
  • Schemas — JSON Schema definitions including governance.schema.json
  • CLI — full CLI reference

Installation

pip install celine-utils

Intended audience

  • Data engineers
  • Platform engineers
  • CELINE application developers

License

Copyright © 2025 Spindox Labs

Licensed under the Apache License, Version 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

celine_utils-1.15.0.tar.gz (52.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

celine_utils-1.15.0-py3-none-any.whl (67.7 kB view details)

Uploaded Python 3

File details

Details for the file celine_utils-1.15.0.tar.gz.

File metadata

  • Download URL: celine_utils-1.15.0.tar.gz
  • Upload date:
  • Size: 52.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.15.0.tar.gz
Algorithm Hash digest
SHA256 81ca3d179de6e82b0569b29caa162f493741d0ffbcb9a1ecc1fbff596a09ada1
MD5 818f1d61e202ef330a8abd3044720b99
BLAKE2b-256 a8ae8bdd04fdca20bee1a46dc7011c4b364c2bb45bda5ea5a3300631fcd2b8e7

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.15.0.tar.gz:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file celine_utils-1.15.0-py3-none-any.whl.

File metadata

  • Download URL: celine_utils-1.15.0-py3-none-any.whl
  • Upload date:
  • Size: 67.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for celine_utils-1.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdbf452863ab1d275646272ee80bdee785edd46c6bf87c3a56b0b3bfb4684dd7
MD5 85b53d2a146d280a15bfd6891dd0eb77
BLAKE2b-256 7c4ed7876dda426b43057156c2203844f7ac39919e6588bdff3bf5a544b38594

See more details on using hashes here.

Provenance

The following attestation bundles were made for celine_utils-1.15.0-py3-none-any.whl:

Publisher: release.yaml on celine-eu/celine-utils

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page