Skip to main content

The Granulate Airflow-Databricks Integration is an open-source plugin for Apache Airflow. It's specifically designed to set environment variables that allow Granulate's performance monitoring agent to identify and integrate with Databricks jobs orchestrated by Airflow. This plugin tags Databricks jobs, aiding the Granulate optimizing agent.

Project description

Granulate Airflow-Databricks Integration

Overview

The Granulate Airflow-Databricks Integration is an open-source plugin for Apache Airflow. It's specifically designed to set environment variables that allow Granulate's performance monitoring agent to identify and integrate with Databricks jobs orchestrated by Airflow. This plugin tags Databricks jobs, aiding the Granulate optimizing agent.

Modes of Operation

The Granulate plugin can operate in three different modes:

  1. Passive Mode: In this mode, you need to manually replace DatabricksSubmitRunOperator and DatabricksSubmitRunDeferrableOperator with GranulateDatabricksSubmitRunOperator and GranulateDatabricksSubmitRunDeferrableOperator respectively in your DAGs. This mode allows you to selectively apply Granulate to specific operators. The Granulate operators are drop-in replacement for their respective operators, so you can just swap them in code for relevant DAGs.

  2. Auto-Patch for Specific DAGs: To enable auto-patching on specific DAGs, import and invoke the patch function from the plugin at the beginning of your DAG file:

    from apache_airflow_granulate_databricks.granulate_plugin import patch
    patch()
    

    This method patches the Databricks operators in the DAG where it's called, enabling the Granulate environment variable.

  3. Auto-Patch for All DAGs: For an all-encompassing approach, install the plugin with the 'auto-patch' extra: pip install apache-airflow-granulate-databricks[auto-patch] This mode patches all Databricks operators across all DAGs in your Airflow environment, automatically applying Granulate enhancements.

Installation

To install the apache-airflow-granulate-databricks package, choose the method that best fits your setup:

  • Using pip:
    • For a standard installation, run: pip install apache-airflow-granulate-databricks
    • To enable automatic DAG patching, include the auto-patch extra: pip install apache-airflow-granulate-databricks[auto-patch]
  • Using _PIP_ADDITIONAL_REQUIREMENTS in Airflow:
    • Append the following line to your _PIP_ADDITIONAL_REQUIREMENTS: apache-airflow-granulate-databricks
    • For auto-patching, use: apache-airflow-granulate-databricks[auto-patch]
    • Restart your Airflow services to apply these changes.

Package Removal

  • If you used pip to install, run: pip uninstall apache-airflow-granulate-databricks
  • If you used _PIP_ADDITIONAL_REQUIREMENTS, remove apache-airflow-granulate-databricks
  • Make sure to revert your code if you used Granulate's operators, or you you used the patch function.
  • Restart your Airflow services to apply these changes.

Requirements

  • Tested on Databricks Airflow Provider (4.2.0 <= version <= 6.0.0)
  • Python version 3.7 or higher

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Support

For support, questions, or issues, please open an issue in the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file apache_airflow_granulate_databricks-0.1.0.tar.gz.

File metadata

File hashes

Hashes for apache_airflow_granulate_databricks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 173e3e1329e8a884ddf27ef1b6afe763b6b890bacc7deea3a9505a62efe8fc9c
MD5 9b8a4d03c386191215892ab08c27dbe0
BLAKE2b-256 341e8fff27750f448d616ba1a8a0acc3f654dbd57be16e3530b4b488a6a1714f

See more details on using hashes here.

File details

Details for the file apache_airflow_granulate_databricks-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for apache_airflow_granulate_databricks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f9256c65c03f239208ffc49537509bdb9cf371c90600dba745184d1af4aa6fb
MD5 bf5d26efa81e3972837b85fcea2d26d0
BLAKE2b-256 ee732de0c8985dc9cc0e986daeaf9ca602a2feb2fb54e5b909fb73a7ea00871b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page