Skip to main content

A Fivetran async provider for Apache Airflow

Project description

Fivetran Async Provider for Apache Airflow

This package provides an async operator, sensor and hook that integrates Fivetran into Apache Airflow. FivetranSensor allows you to monitor a Fivetran sync job for completion before running downstream processes. FivetranOperator submits a Fivetran sync job and polls for its status on the triggerer. Since an async sensor or operator frees up worker slot while polling is happening on the triggerer, they consume less resources when compared to traditional "sync" sensors and operators.

Fivetran automates your data pipeline, and Airflow automates your data processing.

Installation

Prerequisites: An environment running apache-airflow.

pip install airflow-provider-fivetran-async

Configuration

In the Airflow user interface, configure a Connection for Fivetran. Most of the Connection config fields will be left blank. Configure the following fields:

  • Conn Id: fivetran
  • Conn Type: Fivetran
  • Login: Fivetran API Key
  • Password: Fivetran API Secret

Find the Fivetran API Key and Secret in the Fivetran Account Settings, under the API Config section. See our documentation for more information on Fivetran API Authentication.

The sensor assumes the Conn Id is set to fivetran, however if you are managing multiple Fivetran accounts, you can set this to anything you like. See the DAG in examples to see how to specify a custom Conn Id.

Modules

Fivetran Operator Async

from fivetran_provider_async.operators import FivetranOperator

FivetranOperator submits a Fivetran sync job and monitors it on trigger for completion.

FivetranOperator requires that you specify the connector_id of the Fivetran connector you wish to trigger. You can find connector_id in the Settings page of the connector you configured in the Fivetran dashboard.

The FivetranOperator will wait for the sync to complete so long as wait_for_completion=True (this is the default). It is recommended that you run in deferrable mode (this is also the default). If wait_for_completion=False, the operator will return the timestamp for the last sync.

Import into your DAG via:

Fivetran Sensor Async

from fivetran_provider_async.sensors import FivetranSensor

FivetranSensor monitors a Fivetran sync job for completion. Monitoring with FivetranSensor allows you to trigger downstream processes only when the Fivetran sync jobs have completed, ensuring data consistency.

FivetranSensor requires that you specify the connector_id of the Fivetran connector you want to wait for. You can find connector_id in the Settings page of the connector you configured in the Fivetran dashboard.

You can use multiple instances of FivetranSensor to monitor multiple Fivetran connectors.

FivetranSensor is most commonly useful in two scenarios:

  1. Fivetran is using a separate scheduler than the Airflow scheduler.
  2. You set wait_for_completion=False in the FivetranOperator, and you need to await the FivetranOperator task later. (You may want to do this if you want to arrange your DAG such that some tasks are dependent on starting a sync and other tasks are dependent on completing a sync).

If you are doing the 1st pattern, you may find it useful to set the completed_after_time to data_interval_end, or data_interval_end with some buffer:

fivetran_sensor = FivetranSensor(
    task_id="wait_for_fivetran_externally_scheduled_sync",
    connector_id="bronzing_largely",
    poke_interval=5,
    completed_after_time="{{ data_interval_end + macros.timedelta(minutes=1) }}",
)

If you are doing the 2nd pattern, you can use XComs to pass the target completed time to the sensor:

fivetran_op = FivetranOperator(
    task_id="fivetran_sync_my_db",
    connector_id="bronzing_largely",
    wait_for_completion=False,
)

fivetran_sensor = FivetranSensor(
    task_id="wait_for_fivetran_db_sync",
    connector_id="bronzing_largely",
    poke_interval=5,
    completed_after_time="{{ task_instance.xcom_pull('fivetran_sync_op', key='return_value') }}",
)

fivetran_op >> fivetran_sensor

You may also specify the FivetranSensor without a completed_after_time. In this case, the sensor will make note of when the last completed time was, and will wait for a new completed time.

Examples

See the examples directory for an example DAG.

Issues

Please submit issues and pull requests in our official repo: https://github.com/astronomer/airflow-provider-fivetran-async

We are happy to hear from you. Please email any feedback to the authors at humans@astronomer.io.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_provider_fivetran_async-2.1.0.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file airflow_provider_fivetran_async-2.1.0.tar.gz.

File metadata

File hashes

Hashes for airflow_provider_fivetran_async-2.1.0.tar.gz
Algorithm Hash digest
SHA256 7f0cf1066e00d4f36c20f17b5a2f07235939f3763af25d6c2de5848f2632a4e0
MD5 e97dbcbfb3db4a3fa5b59e9b05d4c05c
BLAKE2b-256 fc16cb254fbe5055c0678bab3cada8e941e5b0d772f76768845f00ff7bc3a3e7

See more details on using hashes here.

File details

Details for the file airflow_provider_fivetran_async-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_fivetran_async-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4dbba63b2b93114b2d0a84869b9eaa2e5da5225c9fa1c407417cffc4bba35b6f
MD5 d3283d8272d9f204e92615283e687ba3
BLAKE2b-256 eefa08c0c977f6224630c72ac8e77620e7d16b5bd166f4777a232b853c17ebd5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page