Skip to main content

No project description provided

Project description

Build Status codecov PyPI PyPI - Downloads

Hop Airflow plugin

This is an Apache Hop plugin for Apache Airflow in order to orchestrate Apache Hop pipelines and workflows from Airflow.

Requirements

Before setting up the plugin you must have completely set up your Hop environment:

  • Configured Hop Server
  • Configured remote pipeline configuration
  • Configured remote workflow configuration

To do so, go to the metadata window in Hop UI by pressing Ctrl+Shift+M or click the icono icon.

Hop Server

Double click Hop server to create a new configuration.

Remote pipeline Config

Double click Pipeline Run Configuration to create a new configuration.

Remote workflow config

Double click Workflow Run Configuration to create a new configuration.

Set up guide

The following content will be a "how to set up the plugin" guide plus some requirements and restraints when it comes to its usage.

1. Generate metadata.json

For the correct configuration of this plugin a file containing all Hop's metadata must be created inside each project directory. This can be done by exporting it from Hop UI.

Please note that this process must be repeated each time the metadata of a project is modified.

Metadata option

2. Install the plugin

The first step in order to get this plugin working is to install the package using the following command:

pip install airflow-hop-plugin

3. Hop Directory Structure

Due to some technical limitations it's really important for the Hop home directory to have the following structure.

hop # This is the hop home directory
├── ...
├── config
│   ├── hop-config.json
│   ├── example_environment.json # This is where you should save your environment files
│   ├── metadata
│   │   └── ...
│   └── projects
│       ├── ...
│       └── example_project # This is how your project's directory should look
│           ├── metadata.json
│           ├── metadata
│           │   └── ...
│           ├── example_directory
│           │   └── example_workflow.hwl
│           └── example_pipeline.hpl
├── ...

Moreover, please remember to save all projects inside the "projects" directory and set a path relative to the hop home directory when configuring them like shown in the following picture:

Here goes an image

4. Create an Airflow Connection

To correctly use the operators you must create a new Airflow connection. There are multiple ways to do so and whichever you want can be used, but it should have these values for the following attributes:

  • Connection ID: 'hop_default'
  • Connection Type: 'http'
  • Login: apache_hop_username
  • Password: apache_hop_password
  • Host: apache_hop_server
  • Port: apache_hop_port
  • Extra: "hop_home": "/path/to/hop-home/"

Example of a new Airflow connection using Airflow's CLI:

airflow connections add 'hop_default' \
    --conn-json '{
        "conn_type": "http",
        "login": "cluster",
        "password": "cluster",
        "host": "0.0.0.0",
        "port": 8080,
        "schema": "",
        "extra": {
            "hop_home": "/home/user/hop"
        }
    }'

5. Creating a DAG

Here's an example of a DAG:

from airflow_hop.operators import HopPipelineOperator
from airflow_hop.operators import HopWorkflowOperator

# ... #

with DAG('sample_dag', start_date=datetime(2022,7,26),
        schedule_interval='@daily', catchup=False) as dag:

    # Define a pipeline
    first_pipe = HopPipelineOperator(
        task_id='first_pipe',
        pipeline='pipelines/first_pipeline.hpl',
        pipe_config='remote hop server',
        project_name='default',
        log_level='Basic')

    # Define a pipeline with parameters
    second_pipe = HopPipelineOperator(
        task_id='second_pipe',
        pipeline='pipelines/second_pipeline.hpl',
        pipe_config='remote hop server',
        project_name='default',
        log_level='Basic',
        params={'DATE':'{{ ds }}'}) # Date in yyyy-mm-dd format

    # Define a workflow with parameters
    work_test = HopWorkflowOperator(
        task_id='work_test',
        workflow='workflows/workflow_example.hwf',
        project_name='default',
        log_level='Basic',
        params={'DATE':'{{ ds }}'}) # Date in yyyy-mm-dd format

    first_pipe >> second_pipe >> work_test

It's important to point out that both the workflow and pipeline parameters within their respective operators must be a relative path parting from the project's directory.

Development

Deploy Apache Hop Server using Docker

Requeriments:

If you want to use Docker to create the server you can use the following docker-compose configuration as a template:

services:
  apache-hop:
    image: apache/hop:latest
    ports:
      - 8080:8080
    volumes:
      - hop_path:/home/hop
    environment:
      HOP_SERVER_USER: cluster
      HOP_SERVER_PASS: cluster
      HOP_SERVER_PORT: 8080
      HOP_SERVER_HOSTNAME: 0.0.0.0

Once done, the Hop server can be started using docker compose.

License

Copyright 2022 Aneior Studio, SL

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_hop_plugin-0.1.0b5.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

airflow_hop_plugin-0.1.0b5-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file airflow_hop_plugin-0.1.0b5.tar.gz.

File metadata

  • Download URL: airflow_hop_plugin-0.1.0b5.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for airflow_hop_plugin-0.1.0b5.tar.gz
Algorithm Hash digest
SHA256 915ae9ae944ac10cf036f29119c4bee237a4d2d62c35c0c88358719081c9ba75
MD5 62a18dfb98a4f32a0ba244332fd7c198
BLAKE2b-256 ed93a2d5b15edb09355110b72fa902fc720b32b0d42ec5532639114da2b98a6f

See more details on using hashes here.

File details

Details for the file airflow_hop_plugin-0.1.0b5-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_hop_plugin-0.1.0b5-py3-none-any.whl
Algorithm Hash digest
SHA256 5148faac9ab0ee6a1aed8862a729dc1b443ad53942e3eca32ec0fa923684d47c
MD5 6462b9363dbbd84b37b769e212139fb5
BLAKE2b-256 5abbcf8c3f51e5c29b45b9d40a20d42d5544e84e7b199e7ca5de2eaa7575abdc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page