An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Project description
airflow-plugin-glue_presto_apas
An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.
Usage
from datetime import timedelta
import airflow
from airflow.models import DAG
from airflow.operators.glue_add_partition import GlueAddPartitionOperator
from airflow.operators.glue_presto_apas import GluePrestoApasOperator
args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(2),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
dag_id='example-dag',
schedule_interval='0 0 * * *',
default_args=args,
)
GluePrestoApasOperator(task_id='example-task-1',
db='example_db',
table='example_table',
sql='example.sql',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
GlueAddPartitionOperator(task_id='example-task-2',
db='example_db',
table='example_table',
partition_kv={
'table_schema': 'example_db',
'table_name': 'example_table'
},
catalog_region_name='ap-northeast-1',
dag=dag,
)
if __name__ == "__main__":
dag.cli()
Configuration
glue_presto_apas.GluePrestoApasOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- sql: sql file name for selecting data (string, required)
- fmt: data format when storing data (string, default =
parquet
) - additional_properties: additional properties for creating table. (dict[string, string], optional)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- save_mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,ignore
,overwrite
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- presto_conn_id: connection id for presto (string, default = 'presto_default')
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, sql, location, partition_kv].
glue_add_partition.GlueAddPartitionOperator
- db: database name for parititioning (string, required)
- table: table name for parititioning (string, required)
- location: location for the data (string, default = auto generated by hive repairable way)
- partition_kv: key values for partitioning (dict[string, string], required)
- mode: mode when storing data (string, default =
overwrite
, available values areskip_if_exists
,error_if_exists
,overwrite
) - follow_location: Skip to add a partition and drop the partition if the location does not exist. (boolean, default =
True
) - catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
- catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
- aws_conn_id: connection id for aws (string, default = 'aws_default')
Templates can be used in the options[db, table, location, partition_kv].
Development
Run Example
PRESTO_HOST=${YOUR PRESTO HOST} PRESTO_PORT=${YOUR PRESTO PORT} ./run-example.sh
Release
poetry publish --build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file airflow-plugin-glue_presto_apas-0.0.11.tar.gz
.
File metadata
- Download URL: airflow-plugin-glue_presto_apas-0.0.11.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.11 CPython/3.7.2 Darwin/18.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8846a69bf63f76d4368f7de3aae12557bd6e27bb977a914464f07c4951c5e5a7 |
|
MD5 | 569ebebeaa15d8e6e05f2d017c51cd6b |
|
BLAKE2b-256 | 73056f689e4c1e9e21b5563b6c6a976491e75a0404afb5080db3e3da8298ce58 |
File details
Details for the file airflow_plugin_glue_presto_apas-0.0.11-py3-none-any.whl
.
File metadata
- Download URL: airflow_plugin_glue_presto_apas-0.0.11-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.11 CPython/3.7.2 Darwin/18.5.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a398a67d1d973178a3cdd9ea23e057eaf68e2bf7608b95faf79f0f65b696ac9c |
|
MD5 | d9aefa8ff14ae8046016a5d826b3a7fa |
|
BLAKE2b-256 | 07a5b544b0c80b229247a433e3cc68786fd5a9972c51f8a2af98112040f96b4d |