Skip to main content

airtunnel – tame your Airflow!

Project description

Build Status Code Style: Black Python Version

Airtunnel is a means of supplementing Apache Airflow, a platform for workflow automation in Python which is angled at analytics/data pipelining. It was born out of years of project experience in data science, and the hardships of running large data platforms in real life businesses. Hence, Airtunnel is both a set of principles (read more on them in the Airtunnel introduction article) and a lightweight Python library to tame your airflow!

Why choose airtunnel?

Because you will…

❤️ …stop worrying and love the uncompromised consistency

🚀 …need a clean codebase with separated concerns to be scalable

📝 …get metadata for ingested files, load status and lineage out-of-the-box

🏃 …have it up and running in minutes

🍺 …spend less time debugging Airflow DAGs doing worthwhile things instead

Getting started

To get started, we warmly recommended to read the Airtunnel introduction article and the Airtunnel tutorial. Also check out the demo project.

Installation

  1. We suppose you have installed Apache Airflow in some kind of Python virtual environment. From there, simply do a pip install airtunnel to get the package.

  2. Configure your codebase according to the Airtunnel principles: You need to add three folders for a declaration store, a scripts store and finally the data store:

    2.1) The declaration store folder has no subfolders. It is where your data asset declarations (YAML files) will reside

    2.2) The scripts store folder is where all your Python & SQL scripts to process data assets will reside. It should be broken down by subfolders py for Python scripts and sql for SQL scripts. Please further add subfolders dml and ddl into the sql script folder.

    2.3) The data store folder follows a convention as well, refer to the docs on how to structure it.

  3. Configure Airtunnel by extending your existing airflow.cfg (as documented here):

    3.1) Add the configuration section [airtunnel] in which, you need to add three configuration keys.

    3.2) add declarations_folder which takes the absolute path to the folder you set up in 2.1

    3.3) add scripts_folder which takes the absolute path to the folder you set up in 2.2

    3.4) add data_store_folder, which takes the absolute path to the folder you set up in 2.3 for your data store

Installation requirements

  • Python >= 3.6, Airflow >=1.10 and Pandas >= 0.23

    We assume Airtunnel is implemented best early on in a project, which is why going with a recent Python and Airflow version makes the most sense. In the future we might do more tests and include coverage for older Airflow versions.

  • PySpark is supported from 2.3+

Documentation

Airtunnel’s documentation is on GitHub pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airtunnel-1.0.1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

airtunnel-1.0.1-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file airtunnel-1.0.1.tar.gz.

File metadata

  • Download URL: airtunnel-1.0.1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for airtunnel-1.0.1.tar.gz
Algorithm Hash digest
SHA256 61288e229bfb940ba0ceb75b2650fcf92d63955848d6c7847625bfc9f3dbcaa3
MD5 53f86ae0d090df2bd2d86788624366fc
BLAKE2b-256 1541f7eac45bc1a6c9f8d0d9799d53bef28068f85f1aedc4463eab452bb19c04

See more details on using hashes here.

File details

Details for the file airtunnel-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: airtunnel-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for airtunnel-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85176f2be76ddf0ffabfc522eea76599baf5498dc78d7473292f15d443e43fdf
MD5 514923ac33a3c1b4dfb87ed5408709cc
BLAKE2b-256 f05c3263b45523dd92c1db345d24d64faea04426e361549235d8c31168ee0671

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page