An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Project description
airflow-plugin-glue_presto_apas
An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Usage
from datetime import timedelta
import airflow
from airflow.models import DAG
from airflow.operators.glue_add_partition import GlueAddPartitionOperator
from airflow.operators.glue_presto_apas import GluePrestoApasOperator
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
dag_id='example-dag',
schedule_interval='0 0 * * *',
default_args=args,
)
GluePrestoApasOperator(task_id='example-task-1',
db='example_db',
table='example_table',
sql='example.sql',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
GlueAddPartitionOperator(task_id='example-task-2',
db='example_db',
table='example_table',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
if __name__ == "__main__":
dag.cli()
Configuration
glue_presto_apas.GluePrestoApasOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- sql: sql file name for selecting data (string, required)
- fmt: data format when storing data (string, default =
parquet
) - additional_properties: additional properties for creating table. (dict[string, string], optional)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- save_mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,ignore
,overwrite
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- presto_conn_id: connection id for presto (string, default = 'presto_default')
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, sql, location, partition_kv].
glue_add_partition.GlueAddPartitionOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,overwrite
) - follow_location: Skip to add a partition and drop the partition if the location does not exist. (boolean, default =
True
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, location, partition_kv].
Development
Run Example
PRESTO_HOST=${YOUR PRESTO HOST} PRESTO_PORT=${YOUR PRESTO PORT} ./run-example.sh
Release
poetry publish --build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for airflow-plugin-glue_presto_apas-0.0.8.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b3f92c405b4e908fb07a3c4f2f53f8ae5148a51f03a3ea84623125628ae1e25 |
|
MD5 | b4129a3be12c278e9a7dfa5d56d92143 |
|
BLAKE2b-256 | e59f10089f03395537bddb7263d74cd2d51788b897aacebcd62d0936cd280679 |
Close
Hashes for airflow_plugin_glue_presto_apas-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c931851a7369ca2fe486c2f2448dcc97c8aeba60f6dffc5f8bdf4bae774789f8 |
|
MD5 | d69ea1642a46b51d6c60597cd26c2dc7 |
|
BLAKE2b-256 | d8e4589769eb63a938ee465179146956bbf7be218141634924c2fa55f006e7a3 |