Skip to main content

Airflow operator for Turbine data quality checks

Project description

airflow-turbine

Apache Airflow operator for Turbine. Drop TurbineCheckOperator into a DAG, point it at a Turbine server with an Airflow Connection, and your data-quality checks run as native Airflow tasks. The operator is deferrable — it releases the worker slot while Turbine evaluates, then resumes to log every Check Outcome and fail the task on any breach.

Install

uv add airflow-turbine

Python 3.12 or newer. Pulls only apache-airflow and turbine-client — the Turbine engine never enters your Airflow environment.

Minimal example

from airflow_turbine import TurbineCheckOperator

quality = TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
)

The Connection's host / port / schema build the base URL; password (when set) becomes a static bearer token. With no other configuration the operator runs every contract registered with the server.

Contract Selection

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    contracts=["orders", "customers"],   # list, single id, or None to fan out across every contract
)

contracts=None (the default) fans out across every registered contract; contracts=[] is a no-op that succeeds without running anything (useful when the list is templated from an upstream task).

Check Window from the DAG's data interval

Opt in to forward data_interval_start / data_interval_end as the Check Window:

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    use_data_interval=True,
)

Off by default — every scheduled DAG has a data interval, and forwarding it unconditionally would spray warnings on contracts whose SLAs do not declare a latency element. Setting use_data_interval=True together with incremental=True is rejected at task construction.

Run options

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    incremental=True,    # check rows newer than the last watermark; mutually exclusive with use_data_interval
    flag_rows=True,      # persist failing-row PKs to the flag matrix
)

Auth

Three modes, picked from the Connection:

Connection shape Resolved auth
password set, no OAuth extra BearerAuth(token=password)
OAuth fields in extra (tenant_id, client_id, client_secret, scope) AzureADClientCredentials(...)
No password, no OAuth extra None (in-cluster, unauthenticated)

Override the Connection-derived auth explicitly:

from turbine_client import BearerAuth

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    auth=BearerAuth(token="explicit"),
)

Any explicit auth=... requires turbine_conn_id to be set — the trigger re-reads the connection on the triggerer process to rebuild the auth without carrying credentials across the deferral boundary.

Behaviour on failure

Each Check Outcome is logged on resume. The task raises RuntimeError when:

  • The Check Run finishes with RunStatus.FAILED.
  • Any Check Outcome is FAILED or ERRORED.

WARNED outcomes log at WARNING level but do not fail the task — advisory checks (severity: warn in the Quality Spec) surface as a soft signal.

Compatibility

airflow-turbine >= 0.5.12 requires turbine-data >= 0.5.12 on the server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_turbine-0.5.13.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_turbine-0.5.13-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_turbine-0.5.13.tar.gz.

File metadata

  • Download URL: airflow_turbine-0.5.13.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for airflow_turbine-0.5.13.tar.gz
Algorithm Hash digest
SHA256 496765e46bf714e06e299c4cc208a2a37ba0b68744818487523f26c5572be553
MD5 e8ab296e16fbc9eb165b7bc8b0651d64
BLAKE2b-256 ccbb2b27ab020c96478b85e477fe79afeaa2c3a522109b3fa627025b9c4abd11

See more details on using hashes here.

File details

Details for the file airflow_turbine-0.5.13-py3-none-any.whl.

File metadata

  • Download URL: airflow_turbine-0.5.13-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for airflow_turbine-0.5.13-py3-none-any.whl
Algorithm Hash digest
SHA256 8747bd36163fac459ffc6b3adbe7361ec266676e83f1ae3c2ea622cb7a49e69e
MD5 1a6b3fd0bae10e7305a0f45f091a563b
BLAKE2b-256 1735f1c1c8539baf10ea36850187d9d4bad793163876d4a283b1d192c0e9ad54

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page