Skip to main content

Airflow ECR plugin

Project description

Airflow AWS ECR Plugin

Build Status codecov Python Versions Package Version Black

This plugin exposes an operator that refreshes ECR login token at regular intervals.

About

Amazon ECR is a AWS managed Docker registry to host private Docker container images. Access to Docker repositories hosted on ECR can be controlled with resource based permissions using AWS IAM.

To push/pull images, Docker client must authenticate to ECR registry as an AWS user. An authorization token can be generated using AWS CLI get-login-password command that can be passed to docker login command to authenticate to ECR registry. For instructions on setting up ECR and obtaining login token to authenticate Docker client, click here.

The authorization token obtained using get-login-password command is only valid for 12 hours and Docker client needs to authenticate with fresh token after every 12 hours to make sure it can access Docker images hosted on ECR. Moreover, ECR registries are region specific and separate token should be obtained to authenticate to each registry.

The whole process can be quite cumbersome when combined with Apache Airflow. Airflow's DockerOperator accepts docker_conn_id parameter that it uses to authenticate and pull images from private repositories. In case this private registry is ECR, a connection can be created with login token obtained from get-login-password command and the corresponding ID can be passed to DockerOperator. However, since the token is only valid for 12 hours, DockerOperator will fail to fetch images from ECR once token is expired.

This plugin implements RefreshEcrDockerConnectionOperator Airflow operator that can automatically update the ECR login token at regular intervals.

Installation

Pypi

pip install airflow-ecr-plugin

Poetry

poetry add airflow-ecr-plugin@latest

Getting Started

Once installed, plugin can be loaded via setuptools entrypoint mechanism.

Update your package's setup.py as below:

from setuptools import setup

setup(
    name="my-package",
    ...
    entry_points = {
        'airflow.plugins': [
            'aws_ecr = airflow_ecr_plugin:AwsEcrPlugin'
        ]
    }
)

If you are using Poetry, plugin can be loaded by adding it under [tool.poetry.plugin."airflow.plugins"] section as below:

[tool.poetry.plugins."airflow.plugins"]
"aws_ecr" = "airflow_ecr_plugin:AwsEcrPlugin"

Once plugin is loaded, same will be available for import in python modules.

Now create a DAG to refresh ECR tokens,

from datetime import timedelta

import airflow
from airflow.operators import aws_ecr


DEFAULT_ARGS = {
    "depends_on_past": False,
    "retries": 0,
    "owner": "airflow",
}

REFRESH_ECR_TOKEN_DAG = airflow.DAG(
    dag_id="Refresh_ECR_Login_Token",
    description=(
        "Fetches the latest token from ECR and updates the docker "
        "connection info."
    ),
    default_args=DEFAULT_ARGS,
    schedule_interval=<token_refresh_interval>,
    # Set start_date to past date to make sure airflow picks up the tasks for
    # execution.
    start_date=airflow.utils.dates.days_ago(2),
    catchup=False,
)

# Add below operator for each ECR connection to be refreshed.
aws_ecr.RefreshEcrDockerConnectionOperator(
    task_id=<task_id>,
    ecr_docker_conn_id=<docker_conn_id>,
    ecr_region=<ecr_region>,
    aws_conn_id=<aws_conn_id>,
    dag=REFRESH_ECR_TOKEN_DAG,
)

Placeholder parameters in above code snippet are defined below:

  • token_refresh_interval: Time interval to refresh ECR login tokens. This should be less than 12 hours to prevent any access issues.
  • task_id: Unique ID for this task.
  • docker_conn_id: The Airflow Docker connection ID corresponding to ECR registry, that will be updated when this operator runs. The same connection ID should be passed to DockerOperator that pulls image from ECR registry. If connection does not exist in Airflow DB, operator will automatically create it.
  • ecr_region: AWS region of ECR registry.
  • aws_conn_id: Airflow connection ID corresponding to AWS user credentials that will be used to authenticate and retrieve new login token from ECR. This user should at minimum have ecr:GetAuthorizationToken permissions.

Known Issues

If you are running Airflow v1.10.7 or earlier, the operator will fail due to: AIRFLOW-3014.

The work around is to update Airflow connection table password column length to 5000 characters.

Acknowledgements

The operator is inspired from Brian Campbell's post on Using Airflow's Docker operator with ECR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_ecr_plugin-0.1.3.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

airflow_ecr_plugin-0.1.3-py2.py3-none-any.whl (11.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file airflow_ecr_plugin-0.1.3.tar.gz.

File metadata

  • Download URL: airflow_ecr_plugin-0.1.3.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.3 CPython/3.6.7 Linux/4.15.0-1028-gcp

File hashes

Hashes for airflow_ecr_plugin-0.1.3.tar.gz
Algorithm Hash digest
SHA256 947328e807573f15f94623b14093351bb7d9361a1529c9a87a9e1a9a5ba17e52
MD5 434bc74294abee9bf2c4d8b28c82a19e
BLAKE2b-256 d12b83bcd2e7959d86800b679d44a7ac0047f1261e7628b9ca48928488e72d98

See more details on using hashes here.

File details

Details for the file airflow_ecr_plugin-0.1.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_ecr_plugin-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d9655b077cda73d23144d9ff134c582dfac2600b4adb0d3280927832d307d6d7
MD5 8ed4c43874f480b4e645935965df4717
BLAKE2b-256 a6a80c2b552c0a09cd48f0adc1daf330d69aa7360305342875038670dd888715

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page