Skip to main content

airtunnel – tame your Airflow!

Project description

Build Status Code Style: Black Python Version

Airtunnel is a means of supplementing Apache Airflow, a platform for workflow automation in Python which is angled at analytics/data pipelining. It was born out of years of project experience in data science, and the hardships of running large data platforms in real life businesses. Hence, Airtunnel is both a set of principles (read more on them in the Airtunnel introduction article) and a lightweight Python library to tame your airflow!

Why choose airtunnel?

Because you will…

❤️ …stop worrying and love the uncompromised consistency

🚀 …need a clean codebase with separated concerns to be scalable

📝 …get metadata for ingested files, load status and lineage out-of-the-box

🏃 …have it up and running in minutes

🍺 …spend less time debugging Airflow DAGs doing worthwhile things instead

Getting started

To get started, we warmly recommended to read the Airtunnel introduction article and the Airtunnel tutorial

Installation

  1. We suppose you have installed Apache Airflow in some kind of Python virtual environment. From there, simply do a pip install airtunnel to get the package.

  2. Configure your codebase according to the Airtunnel principles: You need to add three folders for a declaration store, a scripts store and finally the data store:

    2.1) The declaration store folder has no subfolders. It is where your data asset declarations (YAML files) will reside

    2.2) The scripts store folder is where all your Python & SQL scripts to process data assets will reside. It should be broken down by subfolders py for Python scripts and sql for SQL scripts. Please further add subfolders dml and ddl into the sql script folder.

    2.3) The data store folder follows a convention as well.

  3. Configure Airtunnel by extending your existing airflow.cfg

    3.1) Add the configuration section [airtunnel] in which, you need to add three configuration keys.

    3.2) add declarations_folder which takes the absolute path to the folder you set up in 2.1

    3.3) add scripts_folder which takes the absolute path to the folder you set up in 2.2

    3.4) add data_store_folder, which takes the absolute path to the folder you set up in 2.3 for your data store

Installation requirements

  • Python >= 3.6, Airflow >=1.10 and Pandas >= 0.23

    We assume Airtunnel is implemented best early on in a project, which is why going with a recent Python and Airflow version makes the most sense. In the future we might do more tests and include coverage for older Airflow versions.

  • PySpark is supported from 2.3+

Documentation

Airtunnels documentation is on GitHub pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airtunnel-1.0.0rc2.tar.gz (1.5 MB view hashes)

Uploaded Source

Built Distribution

airtunnel-1.0.0rc2-py3-none-any.whl (123.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page