Skip to main content

Kinetica Airflow provider package

Project description

Kinetica Provider for Apache Airflow

PyPI - Version GitHub Release Build Distribution

The airflow-provider-kinetica package provides a SQL operator and hook for Kinetica.

1. Overview

Features included in this package are:

  • Airflow hook KineticaSqlHook
  • Airflow operator KineticaSqlOperator
  • Custom connection type with customized connection UI.

Relevant files are:

File Description
kinetica_provider/get_provider_info.py Provider info
example_dags/kinetica_sql_example.py Example DAG with operator and hook.
kinetica_provider/operator/sql.py Contains KineticaSqlHook
kinetica_provider/hooks/sql.py Contains KineticaSqlOperator

2. Installation

Note: Before proceeding make sure airflow is installed according to the offical installation docs.

This airflow-provider-kinetica is available on PyPi. You can install with:

$ pip install airflow-provider-kinetica
Successfully installed airflow-provider-kinetica-1.0.0

You will need to create a default connection named kinetica_default. You can do this in the web UI or with the following syntax:

$ airflow connections add 'kinetica_default' \
    --conn-type 'kinetica' \
    --conn-login 'admin' \
    --conn-password '???'  \
    --conn-host 'http://hostname:9191/'

Note: You will need to restart Airflow complete the installation.

2.1 Optional: Manual Install

As an alternative you can download the .whl from the assets section of the Github release for a manual install:

$ pip install ./airflow_provider_kinetica-1.0.0-py3-none-any.whl
[...]
Successfully installed airflow-provider-kinetica-1.0.0

3. Testing

This section explains how to setup an environment used for build and test.

3.1. Configure Conda environment

To run Airflow we need a specific version of python with its dependencies and so we will use miniconda.

The following steps show how to install miniconda on Linux. You should check the Miniconda documentation for the most recent install instructions.

[~]$ wget https://repo.anaconda.com/miniconda/Miniconda3-py38_23.3.1-0-Linux-x86_64.sh
[~]$ bash Miniconda3-py38_23.3.1-0-Linux-x86_64.sh

After installing make sure you are in the base conda environment. Next we crate an airflow conda environment.

(base) [~]$ conda create --name airflow python=3.8
(base) [~]$ conda activate airflow
(airflow) [~]$ 

3.2. Install Airflow

These steps will show how to configure a standalone Airflow environment.

Note: Before starting make sure you have activated the airflow conda envionmnet.

Determine the download URL of the airflow installer.

(airflow) [~]$ AIRFLOW_VERSION=2.6.1
(airflow) [~]$ PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
(airflow) [~]$ CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
(airflow) [~]$ echo $CONSTRAINT_URL
https://raw.githubusercontent.com/apache/airflow/constraints-2.6.1/constraints-3.8.txt

Install the Airflow package.

(airflow) [~]$ pip install --upgrade pip
(airflow) [~]$ pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

3.4. Install the package in editable mode

When a package is installed for edit the contents of the specified directory get registered with the python environment. This allows for changes to be made without the need for reinstalling.

Change to the location of the package and install it as editable.

(airflow) [~]$ cd ~/fsq-airflow/airflow/airflow-provider-kinetica
(airflow) [airflow-provider-kinetica]$ pip install --editable .

Now you can restart airflow to see the installed provider. Uninstall the package when you are done.

(airflow) [airflow-provider-kinetica]$ python setup.py develop --uninstall

You will need to create the default Kinetica connection. You can modify this in the Admin->Connections dialog.

airflow connections add 'kinetica_default' \
    --conn-type 'kinetica' \
    --conn-login '_default_login' \
    --conn-password '_default_password'  \
    --conn-host 'http://g-p100-300-301-u29.tysons.kinetica.com:9191/'

3.3. Start Airflow in Standalone mode

You must provide a location that will be used for the $AIRFLOW_HOME. We set this in the conda environment.

(airflow) $ mkdir ./home
(airflow) $ conda env config vars set AIRFLOW_HOME=$PWD/home
(airflow) $ conda activate airflow
(airflow) [home] $ echo $AIRFLOW_HOME
~/fsq-airflow/airflow/standalone

When you startup airflow in standalone mode it will copy files into $AIRFLOW_HOME if they do not already exist. When startup is complete it will show the admin and user password for the webserver.

(airflow) [~]$ cd $AIRFLOW_HOME
(airflow) [standalone]$ airflow standalone
[...]
 webserver | [2024-03-07 22:00:34 -0600] [18240] [INFO] Listening at: http://0.0.0.0:8080 (18240)
standalone | Airflow is ready
standalone | Login with username: admin  password: 39FrRzqzRYTK3pc9
standalone | Airflow Standalone is for development purposes only. Do not use this in production!

You can edit the airflow.cfg file if you need to change any ports.

3.5. Example DAGs

Run the example DAGs to verify the installation. See the comments in the code for more details.

5. See Also

5.1 Kinetica Docs

5.2 Airflow Docs

5.3 Building a Provider

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow-provider-kinetica-1.0.3.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file airflow-provider-kinetica-1.0.3.tar.gz.

File metadata

File hashes

Hashes for airflow-provider-kinetica-1.0.3.tar.gz
Algorithm Hash digest
SHA256 199c09300a9e2473f1a85af1d14dc9176810a4c5ff1709b6cd8e94168b75db2f
MD5 4617a1201dce1c23356d7d2215cdd8ee
BLAKE2b-256 aebc9801a701f5611ee79852d05bdec65591934775b1e4d511e063d6654e8713

See more details on using hashes here.

File details

Details for the file airflow_provider_kinetica-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_provider_kinetica-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bbe7974cbbb4f4196dc86f4686cd1bc75fbc4e3170c1f8d50365d754ede0c444
MD5 6764822fb023894aa02c5a20eb66d8e8
BLAKE2b-256 2a3b0af3071a987a6decead8b292ea94f11317f0e086eb5ba669293127225fc5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page