An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Project description
airflow-plugin-glue_presto_apas
An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Usage
from datetime import timedelta
import airflow
from airflow.models import DAG
from airflow.operators.glue_add_partition import GlueAddPartitionOperator
from airflow.operators.glue_presto_apas import GluePrestoApasOperator
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
dag_id='example-dag',
schedule_interval='0 0 * * *',
default_args=args,
)
GluePrestoApasOperator(task_id='example-task-1',
db='example_db',
table='example_table',
sql='example.sql',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
GlueAddPartitionOperator(task_id='example-task-2',
db='example_db',
table='example_table',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
if __name__ == "__main__":
dag.cli()
Configuration
glue_presto_apas.GluePrestoApasOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- sql: sql file name for selecting data (string, required)
- fmt: data format when storing data (string, default =
parquet
) - additional_properties: additional properties for creating table. (dict[string, string], optional)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- save_mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,ignore
,overwrite
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- presto_conn_id: connection id for presto (string, default = 'presto_default')
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, sql, location, partition_kv].
glue_add_partition.GlueAddPartitionOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,overwrite
) - follow_location: Skip to add a partition and drop the partition if the location does not exist. (boolean, default =
True
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, location, partition_kv].
Development
Run Example
PRESTO_HOST=${YOUR PRESTO HOST} PRESTO_PORT=${YOUR PRESTO PORT} ./run-example.sh
Release
poetry publish --build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for airflow-plugin-glue_presto_apas-0.0.11.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8846a69bf63f76d4368f7de3aae12557bd6e27bb977a914464f07c4951c5e5a7 |
|
MD5 | 569ebebeaa15d8e6e05f2d017c51cd6b |
|
BLAKE2b-256 | 73056f689e4c1e9e21b5563b6c6a976491e75a0404afb5080db3e3da8298ce58 |
Close
Hashes for airflow_plugin_glue_presto_apas-0.0.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a398a67d1d973178a3cdd9ea23e057eaf68e2bf7608b95faf79f0f65b696ac9c |
|
MD5 | d9aefa8ff14ae8046016a5d826b3a7fa |
|
BLAKE2b-256 | 07a5b544b0c80b229247a433e3cc68786fd5a9972c51f8a2af98112040f96b4d |