Skip to main content

airflow provider for rudderstack

Project description

The Customer Data Platform for Developers

Website · Documentation · Slack Community


RudderStack Airflow Provider

The RudderStack Airflow Provider lets you programmatically schedule and trigger your Reverse ETL syncs from outside RudderStack and integrate them with your existing Airflow workflows.

For more information on using the Airflow Provider utility, refer to the documentation.

Installation

pip install rudderstack-airflow-provider

Usage

RudderstackOperator

[!NOTE]
Use RudderstackRETLOperator for reverse ETL connections

A simple DAG for triggering syncs for a RudderStack source:

with DAG(
    'rudderstack-sample',
    default_args=default_args,
    description='A simple tutorial DAG',
    schedule_interval=timedelta(days=1),
    start_date=datetime(2021, 1, 1),
    catchup=False,
    tags=['rs']
) as dag:
    rs_operator = RudderstackOperator(
        source_id='<source-id>',
        task_id='<any-task-id>',
        connection_id='rudderstack_conn'
    )

For the complete code, refer to this example.

Operator Parameters

Parameter Description Type Default
source_id Valid RudderStack source ID String None
task_id A unique task ID within a DAG String None
wait_for_completion If True, the task will wait for sync to complete. Boolean False
connection_id The Airflow connection to use for connecting to the Rudderstack API. String rudderstack_default

The RudderStack operator also supports all the parameters supported by the Airflow base operator.

For details on how to run the DAG in Airflow, refer to the documentation.

RudderstackRETLOperator

Trigger syncs for RETL connections

with DAG('rudderstack-sample',
    default_args=default_args,
    description='A simple tutorial DAG',
    schedule_interval=timedelta(days=1),
    start_date=datetime(2021, 1, 1),
    catchup=False,
    tags=['rs']) as dag:
    rs_operator = RudderstackRETLOperator(
        retl_connection_id='2aiDQzMqP6LNuUokWstmaubcZOP',
        task_id='retl-test-sync',
        connection_id='rudder_yeshwanth_dev',
        sync_type='full',
        wait_for_completion=True
    )

Operator parameters

Parameter Description Type Default
retl_connection_id Valid RudderStack RETL connection ID String (templatable) None
task_id A unique task ID within a DAG String None
wait_for_completion If True, the task will wait for sync to complete. Boolean False
connection_id The Airflow connection to use for connecting to the Rudderstack API. String rudderstack_default
sync_type Type of sync to trigger incremental or full (templatable) incremental

For details on how to run the DAG in Airflow, refer to the documentation.

Contribute

We would love to see you contribute to this project. Get more information on how to contribute here.

License

The RudderStack Airflow Provider is released under the MIT License.

Contact Us

For more information or queries on this feature, you can contact us or start a conversation in our Slack community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rudderstack_airflow_provider-1.1.0.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page