Skip to main content

No project description provided

Project description

Airflow Tools

Workflow

Collection of Operators, Hooks and utility functions aimed at facilitating ELT pipelines.

Data Lake Facade

The Data Lake Facade serves as an abstracion over different Hooks that can be used as a backend such as:

  • Azure Data Lake Storage (ADLS)
  • Simple Storage Service (S3)

Operators can create the correct hook at runtime by passing a connection ID with a connection type of aws or adls. Example code:

conn = BaseHook.get_connection(conn_id)
hook = conn.get_hook()

Operators

HTTP to Data Lake

Creates a Example usage:

HttpToDataLake(
    task_id='test_http_to_data_lake',
    http_conn_id='http_test',
    data_lake_conn_id='data_lake_test',
    data_lake_path=s3_bucket + '/source1/entity1/{{ ds }}/',
    endpoint='/api/users',
    method='GET',
    jmespath_expression='data[:2].{id: id, email: email}',
)

JMESPATH expressions

APIs often return the response we are interested in wrapped in a key. JMESPATH expressions are a query language that we can use to select the response we are interested in. You can find more information on JMESPATH expressions and test them here.

The above expression selects the first two objects inside the key data, and then only the id and email attributes in each object. An example response can be found here.

Tests

Integration tests

To guarantee that the library works as intended we have an integration test that attempts to install it in a fresh virtual environment, and we aim to have a test for each Operator.

Running integration tests locally

The lint-and-test.yml workflow sets up the necessary environment variables, but if you want to run them locally you will need the following environment variables:

AIRFLOW_CONN_DATA_LAKE_TEST='{"conn_type": "aws", "extra": {"endpoint_url": "http://localhost:9090"}}'
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_DEFAULT_REGION=us-east-1
TEST_BUCKET=data_lake
S3_ENDPOINT_URL=http://localhost:9090

AIRFLOW_CONN_DATA_LAKE_TEST='{"conn_type": "aws", "extra": {"endpoint_url": "http://localhost:9090"}}' AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY TEST_BUCKET=data_lake S3_ENDPOINT_URL=http://localhost:9090 poetry run pytest tests/ --doctest-modules --junitxml=junit/test-results.xml --cov=com --cov-report=xml --cov-report=html

And you also need to run Adobe's S3 mock container like this:

docker run --rm -p 9090:9090 -e initialBuckets=data_lake -e debug=true -t adobe/s3mock

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_tools-0.1.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

airflow_tools-0.1.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_tools-0.1.1.tar.gz.

File metadata

  • Download URL: airflow_tools-0.1.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.13 Darwin/23.0.0

File hashes

Hashes for airflow_tools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c2c24eac9535dc3366c0eba7235a32296f1244203a89a12155c39bad7db10b2b
MD5 c0d5118e5c6e5b4f1ad5b7b4b04c16c6
BLAKE2b-256 6529c082d4636dcbcf692d296cec1b66a4055dd355fd9bb87bfde972797a48d9

See more details on using hashes here.

File details

Details for the file airflow_tools-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: airflow_tools-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.13 Darwin/23.0.0

File hashes

Hashes for airflow_tools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 167412a63e0da12fd3993d45216c638d51d3182529c182c45a2c58b5dfa29a69
MD5 06f427190292c5b4803ec40c29ad1fdd
BLAKE2b-256 eb9590e026b12a6a73a0e38e0c02176605395e057d71ae2c5f51010d8ddda78c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page