Skip to main content

No project description provided

Project description

About the Airflow Helper

badge1 badge2 badge3 workflow

It’s pretty fresh. Docs maybe not clear yet, keep calm ! I will update them soon :)

Alternative text

Airflow Helper is a tool that currently allows setting up Airflow Variables, Connections, and Pools from a YAML configuration file. Support yaml inheritance & can obtain all settings from existed Airflow Server!

In the future, it can be extended with other helpful features. I’m open to any suggestions and feature requests. Just open an issue and describe what you want.

Motivation

This project allows to set up Connections & Variables & Pools for Airflow from yaml config. And export them to one config file.

Yeah, I know, I know… secrets backend …

But I want to have all variables on my local machine toooo without need to connect to any secrets backend. And on tests also!

So I want to have some tool with that I can define ones all needed connections & variables in config file & forget about them during init new environment on local machine or running tests in CI.

Some of functionality looks like ‘duplicated’ airflow normal cli, but no.. I tried to use for, example, airflow connections export command, but it is export dozend of default connections, that I’m not interested in - and I don’t want them, I want only those connections, that created by me.

Airflow Versions Supports

You can see the github pipeline, that test library opposite each Airflow Version. I can only guarantee that 100% library works with Apache Airflow versions that are added on the CI/CD pipeline, but with big chance it works with all 2.x Apache Airflow versions.

How to use

Installation

  1. With Python in virtualenv from PyPi: https://pypi.org/project/airflow-helper/

pip install airflow-helper
airflow-helper --version
  1. With docker image from Docker Hub: https://hub.docker.com/repository/docker/xnuinside/airflow-helper/

# pull image
docker pull xnuinside/airflow-helper:latest

# sample how to run command

docker run -it xnuinside/airflow-helper:latest --help
  1. Example, how to use in docker-compose: example/docker-compose-example.yaml

Default settings

All arguments that required in cli or Python code have ‘default’ setting, you can check all of them in file ‘airflow_helper/settings.py’

Airflow Helper settings & flags

You can configure how you want to use config - overwrite existed variables/connections/pools with values from config or just skip them, or raise error if already exist.

In cli (or as arguments in Python main class, if you use helper directly from python) exist several useful flags, that you can use:

    airflow-helper load [OPTIONS] [FILE_PATH]

# options:
  --url    TEXT  Apache Airflow full url to connect. You can provide it or host & port separately. [default: None]--host   TEXT  Apache Airflow server host form that obtain existed settings [default: http://localhost]
  --port   TEXT  Apache Airflow server port form that obtain existed settings [default: 8080]
  --user       -u    TEXT  Apache Airflow user with read rights [default: airflow]
  --password   -p    TEXT  Apache Airflow user password [default: airflow]
  --overwrite  -o          Overwrite Connections & Pools if they already exists
  --skip-existed  -se      Skip `already exists` errors
  --help          -h       Show this message and exit.
    airflow-helper create [OPTIONS] COMMAND [ARGS]

# commands:
  from-server                Create config with values from existed Airflow Server
  new                        Create new empty config
# options
  --help          -h       Show this message and exit.

What if I already have Airflow server with dozens of variables??

Obtain current Variables, Connections, Pools from existed server

Note: you should provide host url with protocol like: ‘https://path-to-your-airflow-server.com’ if protocol not in url, it will add ‘http://’ as default protocol

Generate config from existed Airflow Server - it is simple. Just provide creds with read access to existed Airflow Server like. We use Airflow REST API under the hood, so we need:

- server host & port or just url in format 'http://path-to-airflow:8080'
- user login
- user password

And use Airflow Helper:

  1. From cli

# to get help
airflow-helper create -h

# to use command
airflow-helper create path/where/to/save/airflow_settings.yaml --host https://your-airflow-host --port 8080 -u airflow-user -p airflow-password
  1. From python code

from airflow_helper import RemoteConfigObtainter


# by default it will save config in file airflow_settings.yaml
RemoteConfigObtainter(
  user='airflow_user', password='airflow_user_pass', url='https://path-to-airflow:8080').dump_config()
# but you can provide your own path like:

RemoteConfigObtainter(
  user='airflow_user', password='airflow_user_pass', url='https://path-to-airflow:8080').dump_config(
    file_path='any/path/to/future/airflow_config.yaml'
  )

It will create airflow_settings.yaml with all Variables, Pools & Connections inside!

Define config from Scratch

  1. You can init empty config with cli

airflow-helper create new path/airflow_settings.yaml

It will create empty sample-file with pre-defined config values.

  1. Define airflow_settings.yaml file. You can check examples as a files in example/ folder in this git repo (check ‘Config keys’ to see that keys are allowed - or check example/ folder)

About connections: Note that ‘type’ it is not Name of Connection type. It is type id check them here - https://github.com/search?q=repo%3Aapache%2Fairflow%20conn_type&type=code

airflow:
  connections:
  - conn_type: fs
    connection_id: fs_default
    host: localhost
    login: fs_default
    port: null
  pools:
  - description: Default pool
    include_deferred: false
    name: default_pool
    slots: 120
  - description: ''
    include_deferred: true
    name: deferred
    slots: 0
  variables:
  - description: null
    key: variable-name
    value: "variable-value"
  1. Run Airflow Helper to load config

    Required settings:

    • path to config file (by default it search airflow_settings.yaml file)

    • Airflow Server address (by default it tries to connect to localhost:8080)

    • Airflow user login (with admin rights that allowed to set up Pools, Variables, Connections)

    • Airflow user password (for login upper)

    2.1 Run Airflow Helper from cli

  # to get help

  airflow-helper load -h

  # to load config
  airflow-helper load path/to/airflow_settings.yaml --host https://your-airflow-host --port 8080 -u airflow-user -p airflow-password

2.2. Run Airflow Helper from Python Code
from airflow_helper import ConfigUploader


# you can provide only url or host & port
ConfigUploader(
  file_path=file_path, url=url, host=host, port=port, user=user, password=password
  ).upload_config_to_server()

Inheritance (include one config in another)

I love inheritance. So you can use it too. If you have some base vars/pools/connections for all environments and you don’t want copy-paste same settings in multiple files - just use include: property at the start of your config.

Note, that include allows you to include a list of files, they will be inherit one-by-one in order that you define under include arg from the top to the bottom.

Example:

  1. Define your ‘base’ config, for example: airflow_settings_base.yaml

connections:
- conn_type: fs
  connection_id: fs_default
  host: localhost
  login: fs_default
  port: null
pools:
- description: Default pool
  include_deferred: false
  name: default_pool
  slots: 120
  1. Now create your dev-env config : airflow_settings_dev.yaml (names can be any that you want) and use ‘include:’ property inside it

include:
  - "airflow_settings_base.yaml"

# here put only dev-special variables/connections/pools
airflow:
    variables:
        pass

This mean that final config that will be uploaded to server will contain base settings + settings that you defined directly in airflow_settings_dev.yaml config

Library Configuration

Airflow Helper uses a bunch of ‘default’ settings under the hood. Because library uses pydantic-settings, you can also overwrite those configurations settings with environment variables or with monkey patch python code.

To get full list of possible default settings - check file airflow_helper/settings.py.

If you never heard about pydantic-settings - check https://docs.pydantic.dev/latest/concepts/pydantic_settings/.

Example, to overwrite default airflow host you should provide environment variable with prefix AIRFLOW_HELPER_ and name HOST, so variable name should looks like AIRFLOW_HELPER_HOST

TODO

  1. Documentation website

  2. Getting Variables, Pools, Connections directly from Airflow DB (currently available only with Airflow REST API)

  3. Load configs from S3 and other cloud object storages

  4. Load configs from git

  5. Create overwrite mode for settings upload

Changelog

0.2.0

  1. Added check for variables - now if variable already exists on server Airflow Helper will raise error if you tries to overwrite it from the config. To overwrite existed Variables, Connections, Pools - use flag ‘–overwrite’ or argument with same name, if you use Airflow Helper from Python.

  2. Added flag –skip-existed to avoid raise error if variables/connections/pools exists already on Airflow Server - it will just add new one from config file.

0.1.2

  1. Do not fail if some sections from config are not exists

0.1.1

  1. Overwrite option added to airflow-helper load command

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_helper-0.2.0.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

airflow_helper-0.2.0-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file airflow_helper-0.2.0.tar.gz.

File metadata

  • Download URL: airflow_helper-0.2.0.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.6 Darwin/22.4.0

File hashes

Hashes for airflow_helper-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1bb6f07b1d52a756b6ffc68d792f95fba43ec0fc1898918419990dc97b375ad0
MD5 d19c9777da7c1415aa68c90e314b13df
BLAKE2b-256 df42bdf66aa1b5f8b76512ba27ebf24008c5e930c5831f50a0f7b6011065211d

See more details on using hashes here.

File details

Details for the file airflow_helper-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: airflow_helper-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.6 Darwin/22.4.0

File hashes

Hashes for airflow_helper-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2b076257cfc79504bba44ddc9a26bdff6e31b9ce11e585442eb35472d5aa8f6b
MD5 e2f7b0e750a9a6f270340f234a31bd7d
BLAKE2b-256 82c1a338ab1a783725df49aef5d049ddfa501d5de59e854dcc588e495865fc8a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page