An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Project description
airflow-plugin-glue_presto_apas
An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Usage
from datetime import timedelta
import airflow
from airflow.models import DAG
from airflow.operators.glue_add_partition import GlueAddPartitionOperator
from airflow.operators.glue_presto_apas import GluePrestoApasOperator
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
dag_id='example-dag',
schedule_interval='0 0 * * *',
default_args=args,
)
GluePrestoApasOperator(task_id='example-task-1',
db='example_db',
table='example_table',
sql='example.sql',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
GlueAddPartitionOperator(task_id='example-task-2',
db='example_db',
table='example_table',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
if __name__ == "__main__":
dag.cli()
Configuration
glue_presto_apas.GluePrestoApasOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- sql: sql file name for selecting data (string, required)
- fmt: data format when storing data (string, default =
parquet
) - additional_properties: additional properties for creating table. (dict[string, string], optional)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- save_mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,ignore
,overwrite
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- presto_conn_id: connection id for presto (string, default = 'presto_default')
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, sql, location, partition_kv].
glue_add_partition.GlueAddPartitionOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,overwrite
) - follow_location: Skip to add a partition and drop the partition if the location does not exist. (boolean, default =
True
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, location, partition_kv].
Development
Run Example
PRESTO_HOST=${YOUR PRESTO HOST} PRESTO_PORT=${YOUR PRESTO PORT} ./run-example.sh
Release
poetry publish --build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for airflow-plugin-glue_presto_apas-0.0.9.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 633ae9d490d94e351dcfe279c7b00a6bbd62a886d6e02ff784a696b9692d7c7a |
|
MD5 | b1d2e97dc0f8b4be089bd78f8be97db9 |
|
BLAKE2b-256 | d8ed6d5a0204ad8b0b357ec678f520837d849570b2cb354963d19e55e3206847 |
Close
Hashes for airflow_plugin_glue_presto_apas-0.0.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 397fb5ed8ea97cd3740b4e8422c478af80626b00f0e61ee7bad443e11885de2a |
|
MD5 | 7d43990fed9198690e414429328a68d3 |
|
BLAKE2b-256 | 510b166287ea9707f6f0a8e6fe2b0ee4a8b26d40453f61f0b8d222b6a2ccd197 |