Skip to main content

An opinionated framework for ETL built on top of Airflow

Project description

gusty

Gusty is an opinionated framework for data ETL built on top of Airflow, where every task is represented by one YAML file, and each task creates a view in a database. Check out the gusty demo for an example of a fully dockerized data pipeline using gusty!

Structure

The .yml approach to generating jobs within Airflow DAGs is not a new idea, but it is useful and there are a few built in benefits to it here.

  • Dependencies - Dependencies can quickly be set in .yml files through one of three means:

    1. Using the dependencies specification, you can set dependencies between jobs in the same DAG.
    2. Using the external_dependencies specification, you can set dependencies between jobs in different DAGs.
    3. For the MaterializedPostgresOperator, dependencies in the same DAG that are a part of the views schema are automatically registered.
  • Operator configuration - After you build an operator, you can pass parameters to it in each .yml job definition file. This means that, for example, if you have to call different API endpoints, you may only need to build one operator to ingest data from this API, and then can specify the endpoint to call in the .yml job definition file.

  • Support for popular notebook formats - There are currently two notebook operators, RmdOperator and JupyterOperator, which enable you to simply write RMarkdown or Jupyter Notebook files and deploy them as jobs in your data pipeline. More importantly, RmdOperator and JupyterOperator are actually executed on separate dedicated docker containers, and interact with the Airflow container via SSH, which is useful if you want to deploy these services separately in the cloud!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gusty-0.0.6.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gusty-0.0.6-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file gusty-0.0.6.tar.gz.

File metadata

  • Download URL: gusty-0.0.6.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for gusty-0.0.6.tar.gz
Algorithm Hash digest
SHA256 c68f370fbdfe9a14b693d45e8f60ed26c194be34b44e9c747fdd5cc676f7f13f
MD5 e489b80d34bad312565b0c9459551150
BLAKE2b-256 a069faf95ef2ee42bf2256a2c3370645966b24e134706eaabdda744fe734b6fb

See more details on using hashes here.

File details

Details for the file gusty-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: gusty-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for gusty-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e17f0d9e5ceebcfb6eeb2fb3eddd4e875c5467bc9b0ca3ea2cdd15ff96b36fac
MD5 cafd118f73a30e1ebee6115c5b28e4d5
BLAKE2b-256 65aaf240b311a52fcf4be50a908eef5c5377a67d5d036bdfe75f26ca75609dd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page