CELINE utils
Project description
CELINE Utils
A collection of shared utilities, libraries, and command-line tools that form the technical backbone of the CELINE data platform. Provides reusable building blocks for data pipelines, governance, lineage, metadata management, and platform integrations.
Not an end-user application — a platform utility layer embedded into CELINE applications and executed within orchestrated environments using Meltano, dbt, Prefect, and OpenLineage.
Scope and goals
- Centralise cross-cutting platform logic used by multiple CELINE projects
- Provide opinionated but extensible tooling for data pipelines
- Enforce consistent governance and lineage semantics
- Reduce duplication across pipeline applications
- Act as a stable foundation for CELINE-compatible services and workflows
Key capabilities
Governance framework
A declarative governance.yaml specification defines the metadata, access control, and dataspace exposure rules for each dataset.
The GovernanceRule model covers:
- Dataset ownership (
owner,attribution) - License and access level (
open,internal,restricted,secret) - Data classification (
pii,green,yellow,red) and retention - Tags, documentation links, and source system
user_filter_column— the column used for per-subject consent-based row filteringexpose: true— controls whether the dataset appears in the DCAT catalogue and is registered as an EDC asset
Extended blocks for DCAT-AP 3.0 and dataspace integration:
dcat: block — propagated to the DCAT-AP catalogue by dataset-api:
publisher_uri— overrides the API-level fallback publisherthemes— EU Publications Office data-theme URIslanguage_uris— dct:language URIsspatial_uris— dct:spatial URIsaccrual_periodicity— dct:accrualPeriodicity URIconforms_to— dct:conformsTo URItemporal.start/temporal.end— dct:temporal coverage
dataspace: block — consumed by export_governance.py when registering datasets in EDC:
contract_required— enablesds:contractRequiredODRL constraintconsent_required— enablesds:consentStatusODRL constraint and consent-based row filteringodrl_action— default ODRL action (defaultuse)purpose— ODRL purpose valuesmedallion— data quality level (gold / silver / bronze)
Governance rules are resolved with pattern matching via GovernanceResolver — defaults cascade from the defaults: block into each source entry, with per-source values taking precedence. The expose and dcat/dataspace fields use an OR-merge for booleans and override-merge for objects.
Both celine-utils (pipeline side) and dataset-api/cli/export_governance.py (catalogue side) parse the same governance.yaml format. EDC-specific sub-objects in the dataspace: block are silently ignored by celine-utils via model_config = ConfigDict(extra="ignore").
Pipeline orchestration
Structured execution layer for:
- Meltano ingestion pipelines
- dbt transformations and tests
- Prefect-based Python flows
The PipelineRunner coordinates execution, logging, error handling, and lineage emission consistently across tools.
See the pipeline tutorial.
OpenLineage integration
- Automatic emission of START, COMPLETE, FAIL, and ABORT events
- Dataset-level schema facets
- Data quality assertions from dbt tests
- Custom CELINE governance facets (including
userFilterColumn,medallion,classification)
Dataset tooling
The DatasetClient enables:
- Schema and table introspection
- Column metadata inspection
- Safe query construction
- Export to Pandas
Platform integrations
- Keycloak for identity and access management
- Apache Superset for analytics platform integration
- MQTT for lightweight messaging
CLI
celine-utils governance generate # generate governance.yaml template
celine-utils pipeline init # scaffold a new pipeline
celine-utils pipeline run # run a pipeline
Repository structure
celine/
admin/
cli/
common/
datasets/
pipelines/
schemas/
tests/
Configuration
Environment-driven via pydantic-settings:
- Environment variables first
- Optional
.envfiles - Typed validation with container-friendly defaults
Documentation
- Pipeline Tutorial — end-to-end pipeline setup guide
- Governance — governance.yaml format, access levels, pattern matching, dcat/dataspace blocks
- Schemas — JSON Schema definitions including
governance.schema.json - CLI — full CLI reference
Installation
pip install celine-utils
Intended audience
- Data engineers
- Platform engineers
- CELINE application developers
License
Copyright © 2025 Spindox Labs
Licensed under the Apache License, Version 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file celine_utils-1.12.0.tar.gz.
File metadata
- Download URL: celine_utils-1.12.0.tar.gz
- Upload date:
- Size: 53.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c446b8cb0e1192852517c0f3bd384eb13597fe2939172bcf03ccb4fbb157a869
|
|
| MD5 |
2a93df9ef200966cbe7953ea5fbc385f
|
|
| BLAKE2b-256 |
3e90deccc7b0c5c4203686a77145894515a0dc4081f195a54e6f1f02bf7ffb74
|
Provenance
The following attestation bundles were made for celine_utils-1.12.0.tar.gz:
Publisher:
release.yaml on celine-eu/celine-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
celine_utils-1.12.0.tar.gz -
Subject digest:
c446b8cb0e1192852517c0f3bd384eb13597fe2939172bcf03ccb4fbb157a869 - Sigstore transparency entry: 1224447892
- Sigstore integration time:
-
Permalink:
celine-eu/celine-utils@5cf23b572ab166aaf3b4a5e8ca762b65d43aa0d8 -
Branch / Tag:
refs/tags/v1.12.0 - Owner: https://github.com/celine-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@5cf23b572ab166aaf3b4a5e8ca762b65d43aa0d8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file celine_utils-1.12.0-py3-none-any.whl.
File metadata
- Download URL: celine_utils-1.12.0-py3-none-any.whl
- Upload date:
- Size: 67.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfa1988ae9609301b9039f2ec0dce20b335980a6482d5572c0655a614a1a0924
|
|
| MD5 |
723356294a0a5e55b345a4214172ce19
|
|
| BLAKE2b-256 |
cef739cec542df6924ac852aadd2722202921161383f3ef8c030f23d4dbb14cf
|
Provenance
The following attestation bundles were made for celine_utils-1.12.0-py3-none-any.whl:
Publisher:
release.yaml on celine-eu/celine-utils
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
celine_utils-1.12.0-py3-none-any.whl -
Subject digest:
cfa1988ae9609301b9039f2ec0dce20b335980a6482d5572c0655a614a1a0924 - Sigstore transparency entry: 1224447920
- Sigstore integration time:
-
Permalink:
celine-eu/celine-utils@5cf23b572ab166aaf3b4a5e8ca762b65d43aa0d8 -
Branch / Tag:
refs/tags/v1.12.0 - Owner: https://github.com/celine-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@5cf23b572ab166aaf3b4a5e8ca762b65d43aa0d8 -
Trigger Event:
push
-
Statement type: