An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Project description
airflow-plugin-glue_presto_apas
An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Usage
from datetime import timedelta
import airflow
from airflow.models import DAG
from airflow.operators.glue_add_partition import GlueAddPartitionOperator
from airflow.operators.glue_presto_apas import GluePrestoApasOperator
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
dag_id='example-dag',
schedule_interval='0 0 * * *',
default_args=args,
)
GluePrestoApasOperator(task_id='example-task-1',
db='example_db',
table='example_table',
sql='example.sql',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
GlueAddPartitionOperator(task_id='example-task-2',
db='example_db',
table='example_table',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
if __name__ == "__main__":
dag.cli()
Configuration
glue_presto_apas.GluePrestoApasOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- sql: sql file name for selecting data (string, required)
- fmt: data format when storing data (string, default =
parquet
) - additional_properties: additional properties for creating table. (dict[string, string], optional)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- save_mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,ignore
,overwrite
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- presto_conn_id: connection id for presto (string, default = 'presto_default')
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, sql, location, partition_kv].
glue_add_partition.GlueAddPartitionOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,overwrite
) - follow_location: Skip to add a partition and drop the partition if the location does not exist. (boolean, default =
True
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, location, partition_kv].
Development
Run Example
PRESTO_HOST=${YOUR PRESTO HOST} PRESTO_PORT=${YOUR PRESTO PORT} ./run-example.sh
Release
poetry publish --build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for airflow-plugin-glue_presto_apas-0.0.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9db1d3d0932c694a8119c6ab2f0d34ba0c89a878fa51f88043c98cb55bb5ccc2 |
|
MD5 | 5fe33bf332576e7b4dc40c8020d8ecad |
|
BLAKE2b-256 | 1c5536916e2b68455d12757a0981641584b0d6a5c8df2956935816d2edaf2e37 |
Close
Hashes for airflow_plugin_glue_presto_apas-0.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e0cfee5235d615f31c5c9b94cdf9cf64c2c24d98e6730bc8e38821ea2f86cfd |
|
MD5 | a203e534dfba0cde76e9af45b4181dc9 |
|
BLAKE2b-256 | 5bc8ae4ab37c0364e860334c6298f529ffefd8a4d3b3603a0ea324cd8e76238a |