Skip to main content

Orchestration Pipelines

Project description

Orchestration-pipelines

PyPI version Python Versions Support Status License

A library for defining and generating Apache Airflow DAGs declaratively using YAML. Currently focused on orchestration of GCP resources (Dataproc, BigQuery, Dataform) and DBT.

[!NOTE] This library is currently in Preview.

Overview

orchestration-pipelines allows you to define complex data workflows in simple, human-readable YAML files. It abstracts away the boilerplate of writing Airflow DAGs in Python, making it easier for non-Python experts to create and manage pipelines.

Supported Python Versions

Python >= 3.9

Features

  • Declarative DAGs: Define your pipeline structure, triggers, and actions in YAML.
  • Rich Actions Support: Built-in support for:
    • Python Scripts
    • Google Cloud BigQuery
    • Google Cloud Dataproc (Serverless, Ephemeral and existing clusters)
    • Google Cloud Dataform
    • DBT
  • Automatic Generation: A simple Python call generates the full Airflow DAG.
  • Versioning: Supports versioning of pipelines via a manifest file(as of Preview, on Google Cloud Composer).

Installation

You can install orchestration-pipelines from PyPI:

pip install orchestration-pipelines

[!IMPORTANT] Ensure your apache-airflow-client version is fully compatible with Airflow 3 to prevent critical DAG parsing or runtime errors. This package utilizes Airflow Client API calls to interact with the metadata database; apache-airflow-client library introduces significant architectural shifts in newer versions, a version mismatch will likely break communication and disrupt your pipelines. Always verify that your client version aligns with your Airflow environment to ensure stability.

Quick Start

1. Define your pipeline in YAML

Create a file named my_pipeline.yml:

modelVersion: "1.0"
pipelineId: "my_pipeline"
description: "A simple example pipeline"
runner: "airflow"

defaults:
  projectId: "your-gcp-project"
  location: "us-central1"

triggers:
  - schedule:
      interval: "0 4 * * *"
      startTime: "2026-01-01T00:00:00"
      catchup: false

actions:
  - sql:
      name: "create_table"
      query:
        inline: "CREATE TABLE IF NOT EXISTS `your-gcp-project.my_dataset.my_table` (id INT64, name STRING);"
      engine:
        bigquery:
          location: "US"

2. Generate the Airflow DAG

Create a Python file named my_pipeline.py in your Airflow DAGs folder:

from orchestration_pipelines_lib.api import generate

# Generate Airflow DAG from pipeline definition file
# airflow | dag
# Root is "dags" directory in Composer bucket
generate("dataform-pipeline-local.yml")

Airflow will parse this Python file and automatically generate the DAG based on your YAML definition.

Advanced Features

Versioning and Manifests

You can manage multiple versions of your pipelines using a manifest.yaml file. This allows you to specify which version of a pipeline should be active.

See the examples/ directory for a sample manifest.yaml and how to use it.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orchestration_pipelines-0.2.0.tar.gz (55.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

orchestration_pipelines-0.2.0-py3-none-any.whl (92.2 kB view details)

Uploaded Python 3

File details

Details for the file orchestration_pipelines-0.2.0.tar.gz.

File metadata

  • Download URL: orchestration_pipelines-0.2.0.tar.gz
  • Upload date:
  • Size: 55.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for orchestration_pipelines-0.2.0.tar.gz
Algorithm Hash digest
SHA256 35de49dadbdf84a5c0731bb2b04bda7fd20e5687fcbaef9d92c924a900e9b913
MD5 5da00fe91eeecc143d76ab931256bcc6
BLAKE2b-256 27bb6ed089c84e60e5c3df05ec6fc0dfcf27d0855d0bb542e02f095fd96fc152

See more details on using hashes here.

Provenance

The following attestation bundles were made for orchestration_pipelines-0.2.0.tar.gz:

Publisher: orchestration-pipelines-py@oss-exit-gate-prod.iam.gserviceaccount.com

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.
  • Statement: Publication detail:
    • Token Issuer: https://accounts.google.com
    • Service Account: orchestration-pipelines-py@oss-exit-gate-prod.iam.gserviceaccount.com

File details

Details for the file orchestration_pipelines-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for orchestration_pipelines-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b228c2b0d3d456602682b3c2ff159c170cec9a1ac4f85e0d5b6b4182c27df5a6
MD5 2691c8bce65b9d8f1b6bc18e60b2754a
BLAKE2b-256 10a13d36ffc18b8740976c4e847acb66a0b5b1893218e2560c3d24ce5d0c3ac2

See more details on using hashes here.

Provenance

The following attestation bundles were made for orchestration_pipelines-0.2.0-py3-none-any.whl:

Publisher: orchestration-pipelines-py@oss-exit-gate-prod.iam.gserviceaccount.com

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.
  • Statement: Publication detail:
    • Token Issuer: https://accounts.google.com
    • Service Account: orchestration-pipelines-py@oss-exit-gate-prod.iam.gserviceaccount.com

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page