Skip to main content

Airflow operator for Turbine data quality checks

Project description

airflow-turbine

Apache Airflow operator for Turbine. Drop TurbineCheckOperator into a DAG, point it at a Turbine server with an Airflow Connection, and your data-quality checks run as native Airflow tasks. The operator is deferrable — it releases the worker slot while Turbine evaluates, then resumes to log every Check Outcome and fail the task on any breach.

Install

uv add airflow-turbine

Python 3.12 or newer. Pulls only apache-airflow and turbine-client — the Turbine engine never enters your Airflow environment.

Minimal example

from airflow_turbine import TurbineCheckOperator

quality = TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
)

The Connection's host / port / schema build the base URL; password (when set) becomes a static bearer token. With no other configuration the operator runs every contract registered with the server.

Contract Selection

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    contracts=["orders", "customers"],   # list, single id, or None to fan out across every contract
)

contracts=None (the default) fans out across every registered contract; contracts=[] is a no-op that succeeds without running anything (useful when the list is templated from an upstream task).

Check Window from the DAG's data interval

Opt in to forward data_interval_start / data_interval_end as the Check Window:

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    use_data_interval=True,
)

Off by default — every scheduled DAG has a data interval, and forwarding it unconditionally would spray warnings on contracts whose SLAs do not declare a latency element. Setting use_data_interval=True together with incremental=True is rejected at task construction.

Run options

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    incremental=True,    # check rows newer than the last watermark; mutually exclusive with use_data_interval
    flag_rows=True,      # persist failing-row PKs to the flag matrix
)

Auth

Three modes, picked from the Connection:

Connection shape Resolved auth
password set, no OAuth extra BearerAuth(token=password)
OAuth fields in extra (tenant_id, client_id, client_secret, scope) AzureADClientCredentials(...)
No password, no OAuth extra None (in-cluster, unauthenticated)

Override the Connection-derived auth explicitly:

from turbine_client import BearerAuth

TurbineCheckOperator(
    task_id="quality",
    turbine_conn_id="turbine",
    auth=BearerAuth(token="explicit"),
)

Any explicit auth=... requires turbine_conn_id to be set — the trigger re-reads the connection on the triggerer process to rebuild the auth without carrying credentials across the deferral boundary.

Behaviour on failure

Each Check Outcome is logged on resume. The task raises RuntimeError when:

  • The Check Run finishes with RunStatus.FAILED.
  • Any Check Outcome is FAILED or ERRORED.

WARNED outcomes log at WARNING level but do not fail the task — advisory checks (severity: warn in the Quality Spec) surface as a soft signal.

Compatibility

airflow-turbine >= 0.5.12 requires turbine-data >= 0.5.12 on the server.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_turbine-0.5.12.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_turbine-0.5.12-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_turbine-0.5.12.tar.gz.

File metadata

  • Download URL: airflow_turbine-0.5.12.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for airflow_turbine-0.5.12.tar.gz
Algorithm Hash digest
SHA256 cc6fc12ef973206f1cbca1e79b6a0ffb3bccf09c32fa89413748fe650a871132
MD5 5d038c3c87a0e5a7fdd379f9a8de8663
BLAKE2b-256 f167c8e41fe5b8ddc1d3b4f2da458a201686e8a0f0d397b8a9be30c6b080e4f2

See more details on using hashes here.

File details

Details for the file airflow_turbine-0.5.12-py3-none-any.whl.

File metadata

  • Download URL: airflow_turbine-0.5.12-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for airflow_turbine-0.5.12-py3-none-any.whl
Algorithm Hash digest
SHA256 f53a014a25d5186ad2e2fd4440c4e9a27003340539b483f3d354fe78795a5107
MD5 ea04f35e6245e239b9ad7e369f288e46
BLAKE2b-256 14de64311c118a34fffd367fe5c4f041ba60310ea17c72a28b64bef6e59f9310

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page