The Granulate Airflow-Databricks Integration is an open-source plugin for Apache Airflow. It's specifically designed to set environment variables that allow Granulate's performance monitoring agent to identify and integrate with Databricks jobs orchestrated by Airflow. This plugin tags Databricks jobs, aiding the Granulate optimizing agent.
Project description
Granulate Airflow-Databricks Integration
Overview
The Granulate Airflow-Databricks Integration is an open-source plugin for Apache Airflow. It's specifically designed to set environment variables that allow Granulate's performance monitoring agent to identify and integrate with Databricks jobs orchestrated by Airflow. This plugin tags Databricks jobs, aiding the Granulate optimizing agent.
Modes of Operation
The Granulate plugin can operate in three different modes:
-
Passive Mode: In this mode, you need to manually replace
DatabricksSubmitRunOperator
andDatabricksSubmitRunDeferrableOperator
withGranulateDatabricksSubmitRunOperator
andGranulateDatabricksSubmitRunDeferrableOperator
respectively in your DAGs. This mode allows you to selectively apply Granulate to specific operators. TheGranulate
operators are drop-in replacement for their respective operators, so you can just swap them in code for relevant DAGs. -
Auto-Patch for Specific DAGs: To enable auto-patching on specific DAGs, import and invoke the
patch
function from the plugin at the beginning of your DAG file:from apache_airflow_granulate_databricks.granulate_plugin import patch patch()
This method patches the Databricks operators in the DAG where it's called, enabling the Granulate environment variable.
-
Auto-Patch for All DAGs: For an all-encompassing approach, install the plugin with the 'auto-patch' extra:
pip install apache-airflow-granulate-databricks[auto-patch]
This mode patches all Databricks operators across all DAGs in your Airflow environment, automatically applying Granulate enhancements.
Installation
To install the apache-airflow-granulate-databricks package, choose the method that best fits your setup:
- Using pip:
- For a standard installation, run:
pip install apache-airflow-granulate-databricks
- To enable automatic DAG patching, include the auto-patch extra:
pip install apache-airflow-granulate-databricks[auto-patch]
- For a standard installation, run:
- Using
_PIP_ADDITIONAL_REQUIREMENTS
in Airflow:- Append the following line to your _PIP_ADDITIONAL_REQUIREMENTS:
apache-airflow-granulate-databricks
- For auto-patching, use:
apache-airflow-granulate-databricks[auto-patch]
- Restart your Airflow services to apply these changes.
- Append the following line to your _PIP_ADDITIONAL_REQUIREMENTS:
Package Removal
- If you used pip to install, run:
pip uninstall apache-airflow-granulate-databricks
- If you used
_PIP_ADDITIONAL_REQUIREMENTS
, removeapache-airflow-granulate-databricks
- Make sure to revert your code if you used Granulate's operators, or you you used the
patch
function. - Restart your Airflow services to apply these changes.
Requirements
- Tested on Databricks Airflow Provider (4.2.0 <= version <= 6.5.0)
- Python version 3.8 or higher
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Support
For support, questions, or issues, please open an issue in the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file apache_airflow_granulate_databricks-0.2.1.tar.gz
.
File metadata
- Download URL: apache_airflow_granulate_databricks-0.2.1.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d97a2f28b102b613bb679c6f105eb2f86dd5d21a06b530358f83915220b805b9 |
|
MD5 | 08a538e5daa3a99cdb0532c4512b2ef6 |
|
BLAKE2b-256 | 694dabbc15343b388de1c04e46f2fe2552f40b3409e13823b8569fee6e4f98af |
File details
Details for the file apache_airflow_granulate_databricks-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: apache_airflow_granulate_databricks-0.2.1-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4898dbdc33fabae7d5589df660cdddd7fa8f1b43adbd40a116754ecc83b63ef |
|
MD5 | 0ec6d32bbad7bfae1ab9acc90d93c44e |
|
BLAKE2b-256 | 36a97cd4010cbc36483a04df2b54fc0eaca0667584eb65406324ea60ceecc211 |