Skip to main content

airtunnel – tame your Airflow!

Project description

Build Status Code Style: Black Python Version

Airtunnel is a means of supplementing Apache Airflow, a platform for workflow automation in Python which is angled at analytics/data pipelining. It was born out of years of project experience in data science, and the hardships of running large data platforms in real life businesses. Hence, Airtunnel is both a set of principles (read more on them in the Airtunnel introduction article) and a lightweight Python library to tame your airflow!

Why choose airtunnel?

Because you will…

❤️ …stop worrying and love the uncompromised consistency

🚀 …need a clean codebase with separated concerns to be scalable

📝 …get metadata for ingested files, load status and lineage out-of-the-box

🏃 …have it up and running in minutes

🍺 …spend less time debugging Airflow DAGs doing worthwhile things instead

Getting started

To get started, we warmly recommended to read the Airtunnel introduction article and the Airtunnel tutorial

Installation

  1. We suppose you have installed Apache Airflow in some kind of Python virtual environment. From there, simply do a pip install airtunnel to get the package.

  2. Configure your codebase according to the Airtunnel principles: You need to add three folders for a declaration store, a scripts store and finally the data store:

    2.1) The declaration store folder has no subfolders. It is where your data asset declarations (YAML files) will reside

    2.2) The scripts store folder is where all your Python & SQL scripts to process data assets will reside. It should be broken down by subfolders py for Python scripts and sql for SQL scripts. Please further add subfolders dml and ddl into the sql script folder.

    2.3) The data store folder follows a convention as well.

  3. Configure Airtunnel by extending your existing airflow.cfg

    3.1) Add the configuration section [airtunnel] in which, you need to add three configuration keys.

    3.2) add declarations_folder which takes the absolute path to the folder you set up in 2.1

    3.3) add scripts_folder which takes the absolute path to the folder you set up in 2.2

    3.4) add data_store_folder, which takes the absolute path to the folder you set up in 2.3 for your data store

Installation requirements

  • Python >= 3.6, Airflow >=1.10 and Pandas >= 0.23

    We assume Airtunnel is implemented best early on in a project, which is why going with a recent Python and Airflow version makes the most sense. In the future we might do more tests and include coverage for older Airflow versions.

  • PySpark is supported from 2.3+

Documentation

Airtunnels documentation is on GitHub pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airtunnel-1.0.0rc2.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

airtunnel-1.0.0rc2-py3-none-any.whl (123.0 kB view details)

Uploaded Python 3

File details

Details for the file airtunnel-1.0.0rc2.tar.gz.

File metadata

  • Download URL: airtunnel-1.0.0rc2.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for airtunnel-1.0.0rc2.tar.gz
Algorithm Hash digest
SHA256 1105a3103c4c1c3539b2e77d6d6c57fb2cff7acce419a47e17b03aeba7ec8acc
MD5 f5d09206609efb6b805a00af073000c7
BLAKE2b-256 af14f3c9e1b8ada4fe21ecf40f1fe8ab9479421a1e09445d797d1a5cad8410aa

See more details on using hashes here.

File details

Details for the file airtunnel-1.0.0rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for airtunnel-1.0.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 51433e1d022533c5090459e0b88af2c27d0101c845e689ef2cbd0f44940732a7
MD5 aa848d10994c60938a0a54192f7b6b2b
BLAKE2b-256 874db90aaf7ef39d5a75570e426e9eb7b4d7f074bb5c3980d983c8e687fbb4dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page