An opinionated framework for ETL built on top of Airflow
Project description
gusty
Gusty is an opinionated framework for data ETL built on top of Airflow, where every task is represented by one YAML file, and each task creates a view in a database.
Structure
The .yml approach to generating jobs within Airflow DAGs is not a new idea, but it is useful and there are a few built in benefits to it here.
-
Dependencies - Dependencies can quickly be set in
.ymlfiles through one of three means:- Using the
dependenciesspecification, you can set dependencies between jobs in the same DAG. - Using the
external_dependenciesspecification, you can set dependencies between jobs in different DAGs. - For the
MaterializedPostgresOperator, dependencies in the same DAG that are a part of theviewsschema are automatically registered.
- Using the
-
Operator configuration - After you build an operator, you can pass parameters to it in each
.ymljob definition file. This means that, for example, if you have to call different API endpoints, you may only need to build one operator to ingest data from this API, and then can specify the endpoint to call in the.ymljob definition file. -
Support for popular notebook formats - There are currently two notebook operators,
RmdOperatorandJupyterOperator, which enable you to simply write RMarkdown or Jupyter Notebook files and deploy them as jobs in your data pipeline. More importantly,RmdOperatorandJupyterOperatorare actually executed on separate dedicated docker containers, and interact with the Airflow container via SSH, which is useful if you want to deploy these services separately in the cloud!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gusty-0.0.2.tar.gz.
File metadata
- Download URL: gusty-0.0.2.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.22.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.42.0 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d54dbfead71f43124406d62487a33692242a02ca5ec2df43e456f467c92452b0
|
|
| MD5 |
6d58cc51ee187763e239824d50f79c90
|
|
| BLAKE2b-256 |
47b535dbc9881bf9c3fbd08b0876ecb9d3c093dd5839e1a9aae1e54c1ae502bf
|