No project description provided
Project description
Airflow Tools
Collection of Operators, Hooks and utility functions aimed at facilitating ELT pipelines.
Data Lake Facade
The Data Lake Facade serves as an abstracion over different Hooks that can be used as a backend such as:
- Azure Data Lake Storage (ADLS)
- Simple Storage Service (S3)
Operators can create the correct hook at runtime by passing a connection ID with a connection type of aws
or adls
. Example code:
conn = BaseHook.get_connection(conn_id)
hook = conn.get_hook()
Operators
HTTP to Data Lake
Creates a Example usage:
HttpToDataLake(
task_id='test_http_to_data_lake',
http_conn_id='http_test',
data_lake_conn_id='data_lake_test',
data_lake_path=s3_bucket + '/source1/entity1/{{ ds }}/',
endpoint='/api/users',
method='GET',
jmespath_expression='data[:2].{id: id, email: email}',
)
JMESPATH expressions
APIs often return the response we are interested in wrapped in a key. JMESPATH expressions are a query language that we can use to select the response we are interested in. You can find more information on JMESPATH expressions and test them here.
The above expression selects the first two objects inside the key data, and then only the id
and email
attributes in each object. An example response can be found here.
Tests
Integration tests
To guarantee that the library works as intended we have an integration test that attempts to install it in a fresh virtual environment, and we aim to have a test for each Operator.
Running integration tests locally
The lint-and-test.yml
workflow sets up the necessary environment variables, but if you want to run them locally you will need the following environment variables:
AIRFLOW_CONN_DATA_LAKE_TEST='{"conn_type": "aws", "extra": {"endpoint_url": "http://localhost:9090"}}'
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
AWS_DEFAULT_REGION=us-east-1
TEST_BUCKET=data_lake
S3_ENDPOINT_URL=http://localhost:9090
AIRFLOW_CONN_DATA_LAKE_TEST='{"conn_type": "aws", "extra": {"endpoint_url": "http://localhost:9090"}}' AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY TEST_BUCKET=data_lake S3_ENDPOINT_URL=http://localhost:9090 poetry run pytest tests/ --doctest-modules --junitxml=junit/test-results.xml --cov=com --cov-report=xml --cov-report=html
And you also need to run Adobe's S3 mock container like this:
docker run --rm -p 9090:9090 -e initialBuckets=data_lake -e debug=true -t adobe/s3mock
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file airflow_tools-0.1.1.tar.gz
.
File metadata
- Download URL: airflow_tools-0.1.1.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.13 Darwin/23.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c2c24eac9535dc3366c0eba7235a32296f1244203a89a12155c39bad7db10b2b |
|
MD5 | c0d5118e5c6e5b4f1ad5b7b4b04c16c6 |
|
BLAKE2b-256 | 6529c082d4636dcbcf692d296cec1b66a4055dd355fd9bb87bfde972797a48d9 |
File details
Details for the file airflow_tools-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: airflow_tools-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.13 Darwin/23.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 167412a63e0da12fd3993d45216c638d51d3182529c182c45a2c58b5dfa29a69 |
|
MD5 | 06f427190292c5b4803ec40c29ad1fdd |
|
BLAKE2b-256 | eb9590e026b12a6a73a0e38e0c02176605395e057d71ae2c5f51010d8ddda78c |