Airflow extension for communicating with Wherobots Cloud
Project description
Airflow Providers for Wherobots
Airflow providers to bring Wherobots Cloud's spatial compute to your data workflows and ETLs.
Installation
If you use Poetry in your project, add the
dependency with poetry add:
$ poetry add airflow-providers-wherobots
Otherwise, just pip install it:
$ pip install airflow-providers-wherobots
Create an http connection
Create a Connection in Airflow. This can be done from Apache Airflow's
Web UI,
or from the command-line. The default Wherobots connection name is
wherobots_default; if you use another name you must specify that name with the
wherobots_conn_id parameter when initializing Wherobots operators.
The only required fields for the connection are:
- the Wherobots API endpoint in the
hostfield; - your Wherobots API key in the
passwordfield.
$ airflow connections add "wherobots_default" \
--conn-type "generic" \
--conn-host "api.cloud.wherobots.com" \
--conn-password "$(< api.key)"
Usage
Execute a Run on Wherobots Cloud
Wherobots allows users to upload their code (.py, .jar), execute it on the
cloud, and monitor the status of the run. Each execution is called a Run.
The WherobotsRunOperator allows you to execute a Run on Wherobots Cloud.
WherobotsRunOperator triggers the run according to the parameters you provide,
and waits for the run to finish before completing the task.
Refer to the Wherobots Managed Storage Documentation to learn more about how to upload and manage your code on Wherobots Cloud.
Below is an example of WherobotsRunOperator
from airflow_providers_wherobots.operators.sql import WherobotsRunOperator
from wherobots.db.region import Region
from wherobots.db.runtime import Runtime
operator = WherobotsRunOperator(
task_id="your_task_id",
name="airflow_operator_test_run_{{ ts_nodash }}",
region=Region.AWS_US_WEST_2,
runtime=Runtime.TINY_A10_GPU,
run_python={
"uri": "s3://wbts-wbc-m97rcg45xi/42ly7mi0p1/data/shared/classification.py"
},
dag=dag,
poll_logs=True,
)
Arguments
The arguments for the WherobotsRunOperator constructor:
-
region: Region: The Wherobots region where runs are hosted. The values available can be found inwherobots.db.region.Region. The default value isRegion.AWS_US_WEST_2and this is the only region in which Wherobots Cloud operates workloads today.[!IMPORTANT] To prepare for the expansion of Wherobots Cloud to new regions and cloud providers, the
regionparameter will become mandatory in a future SDK version. Before this support for new regions is added, we will release an updated version of the SDK. If you continue using an older SDK version, your existing Airflow tasks will still work. However, any new or existing job runs you create without specifying theregionparameter will be hosted in theaws-us-west-2region. -
name: str: The name of the run. If not specified, a default name will be generated. -
runtime: Runtime: The runtime dictates the size and amount of resources powering the run. The default value isRuntime.TINY; see available values here. -
poll_logs: bool: IfTrue, the operator will poll andLogger.info()the run logs until the run finishes. IfFalse, the operator will not poll the logs, only track the status of the run. -
polling_interval: The interval in seconds to poll the status of the run. The default value is30. -
timeout_seconds: int: This parameter sets a maximum run time (in seconds) to prevent runaway processes. If the specified value exceeds the Max Workload Alive Hours, the timeout will be capped at the maximum permissible limit. Defaults to3600seconds (1 hour). -
run_python: dict: A dictionary with the following keys:uri: str: The URI of the Python file to run.args: list[str]: A list of arguments to pass to the Python file.
-
run_jar: dict: A dictionary with the following keys:uri: str: The URI of the JAR file to run.args: list[str]: A list of arguments to pass to the JAR file.mainClass: str: The main class to run in the JAR file.
-
environment: dict: A dictionary with the following keys:sparkDriverDiskGB: int: The disk size for the Spark driver.sparkExecutorDiskGB: int: The disk size for the Spark executor.sparkConfigs: dict: A dictionary of Spark configurations.dependencies: list[dict]: A list of dependant libraries to install.
-
wait_post_run_logs_timeout_seconds: int: Maximum duration (in seconds) the system waits for logs to become queryable after job completion. The default value is60seconds.For more detailed information about the
environmentparameter, refer to Get Run Logs in the Wherobots Documentation.
[!IMPORTANT] Today Wherobots Cloud offers free access to the "Tiny" runtime through the Community Edition Organization. If you need access to larger runtimes (including Memory-Optimized and GPU-Optimized runtimes), consider upgrading to a Professional Edition Organization, Wherobots Cloud's pay-as-you-go plan. For more information, refer to the Upgrade Organization guidance in the Wherobots Documentation.
[!WARNING] The
run_*arguments are mutually exclusive, you can only specify one of them.
The dependencies argument is a list of dictionaries. There are two
types of dependencies supported.
PYPIdependencies:
{
"sourceType": "PYPI",
"libraryName": "package_name",
"libraryVersion": "package_version"
}
FILEdependencies:
{
"sourceType": "FILE",
"filePath": "s3://bucket/path/to/dependency.whl"
}
The file types supported are .whl, .zip, and .jar.
Execute a SQL query
The WherobotsSqlOperator allows you to run SQL queries on the Wherobots cloud,
from which you can build your ETLs and data transformation workflows by
querying, manipulating, and producing datasets with WherobotsDB.
Refer to the Wherobots Documentation and this guidance to learn how to read data, transform data, and write results in Spatial SQL with WherobotsDB.
Refer to the Wherobots Apache Airflow Provider Documentation to get more detailed guidance about how to use the Wherobots Apache Airflow Provider.
Example
Below is an example Airflow DAG that executes a SQL query on Wherobots Cloud:
import datetime
from airflow import DAG
from airflow_providers_wherobots.operators.sql import WherobotsSqlOperator
from wherobots.db.region import Region
from wherobots.db.runtime import Runtime
with DAG(
dag_id="example_wherobots_sql_dag",
start_date=datetime.datetime.now(),
schedule="@hourly",
catchup=False
):
# Create a `wherobots.test.airflow_example` table with 100 records
# from the OMF `places_place` dataset.
operator = WherobotsSqlOperator(
task_id="execute_query",
return_last=False,
runtime=Runtime.TINY,
region=Region.AWS_US_WEST_2,
sql=f"""
INSERT INTO wherobots.test.airflow_example
SELECT id, geometry, confidence, geohash
FROM wherobots_open_data.overture.places_place
LIMIT 100
""",
)
Arguments
region: Region: The Wherobots Compute Region where the SQL query executions are hosted. The values available can be found inwherobots.db.region.Region. The default value isRegion.AWS_US_WEST_2and this is the only region in which Wherobots Cloud operates workloads today.[!IMPORTANT] To prepare for the expansion of Wherobots Cloud to new regions and cloud providers, the
regionparameter will become mandatory in a future SDK version. Before this support for new regions is added, we will release an updated version of the SDK. If you continue using an older SDK version, your existing Airflow tasks will still work. However, any new or existing SQL queries that don't specify theregionparameter will be hosted in theaws-us-west-2region.runtime: Runtime: The runtime dictates the size and amount of resources powering the run. The default value isRuntime.TINY; see available values here.sql: str: The Spatial SQL query to execute.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file airflow_providers_wherobots-1.2.4.tar.gz.
File metadata
- Download URL: airflow_providers_wherobots-1.2.4.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
012c860f32a9742ca5e70037971c1bd7dc7c09e80af1d2ec54bd501c6769b3e5
|
|
| MD5 |
fb676624fb6ee957c96f617a579b9229
|
|
| BLAKE2b-256 |
1de3456dd309770ce0b6a5c47e8b74f795e981c1757f0952d2f25f600fbb9430
|
Provenance
The following attestation bundles were made for airflow_providers_wherobots-1.2.4.tar.gz:
Publisher:
pypi-publish.yaml on wherobots/airflow-providers-wherobots
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
airflow_providers_wherobots-1.2.4.tar.gz -
Subject digest:
012c860f32a9742ca5e70037971c1bd7dc7c09e80af1d2ec54bd501c6769b3e5 - Sigstore transparency entry: 179839113
- Sigstore integration time:
-
Permalink:
wherobots/airflow-providers-wherobots@19d682038fc35ea9721662a9f47be227ccf33943 -
Branch / Tag:
refs/tags/v1.2.4 - Owner: https://github.com/wherobots
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@19d682038fc35ea9721662a9f47be227ccf33943 -
Trigger Event:
push
-
Statement type:
File details
Details for the file airflow_providers_wherobots-1.2.4-py3-none-any.whl.
File metadata
- Download URL: airflow_providers_wherobots-1.2.4-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96741356d06194c68a11ee6d57904b7344f87ea6012d06a4d9a93f44ce3008c9
|
|
| MD5 |
7cb6ea5c63602292a2be16b9bf97d562
|
|
| BLAKE2b-256 |
d27e179e4edbddfc39bd5b5267e2474145a335dc6b046a25334b465a8ed91425
|
Provenance
The following attestation bundles were made for airflow_providers_wherobots-1.2.4-py3-none-any.whl:
Publisher:
pypi-publish.yaml on wherobots/airflow-providers-wherobots
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
airflow_providers_wherobots-1.2.4-py3-none-any.whl -
Subject digest:
96741356d06194c68a11ee6d57904b7344f87ea6012d06a4d9a93f44ce3008c9 - Sigstore transparency entry: 179839114
- Sigstore integration time:
-
Permalink:
wherobots/airflow-providers-wherobots@19d682038fc35ea9721662a9f47be227ccf33943 -
Branch / Tag:
refs/tags/v1.2.4 - Owner: https://github.com/wherobots
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@19d682038fc35ea9721662a9f47be227ccf33943 -
Trigger Event:
push
-
Statement type: