Skip to main content

An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.

Project description

airflow-plugin-glue_presto_apas

PyPi

An Airflow Plugin to Add a Partition As Select(APAS) on Presto that uses Glue Data Catalog as a Hive metastore.

Usage

from datetime import timedelta

import airflow
from airflow.models import DAG

from airflow.operators.glue_add_partition import GlueAddPartitionOperator
from airflow.operators.glue_presto_apas import GluePrestoApasOperator

args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(2),
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}


dag = DAG(
    dag_id='example-dag',
    schedule_interval='0 0 * * *',
    default_args=args,
)

GluePrestoApasOperator(task_id='example-task-1',
                       db='example_db',
                       table='example_table',
                       sql='example.sql',
                       partition_kv={
                           'table_schema': 'example_db',
                           'table_name': 'example_table'
                       },
                       catalog_region_name='ap-northeast-1',
                       dag=dag,
                       )

GlueAddPartitionOperator(task_id='example-task-2',
                         db='example_db',
                         table='example_table',
                         partition_kv={
                             'table_schema': 'example_db',
                             'table_name': 'example_table'
                         },
                         catalog_region_name='ap-northeast-1',
                         dag=dag,
                         )

if __name__ == "__main__":
    dag.cli()

Configuration

glue_presto_apas.GluePrestoApasOperator

  • db: database name for parititioning (string, required)
  • table: table name for parititioning (string, required)
  • sql: sql file name for selecting data (string, required)
  • fmt: data format when storing data (string, default = parquet)
  • additional_properties: additional properties for creating table. (dict[string, string], optional)
  • location: location for the data (string, default = auto generated by hive repairable way)
  • partition_kv: key values for partitioning (dict[string, string], required)
  • save_mode: mode when storing data (string, default = overwrite, available values are skip_if_exists, error_if_exists, ignore, overwrite)
  • catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
  • catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
  • presto_conn_id: connection id for presto (string, default = 'presto_default')
  • aws_conn_id: connection id for aws (string, default = 'aws_default')

Templates can be used in the options[db, table, sql, location, partition_kv].

glue_add_partition.GlueAddPartitionOperator

  • db: database name for parititioning (string, required)
  • table: table name for parititioning (string, required)
  • location: location for the data (string, default = auto generated by hive repairable way)
  • partition_kv: key values for partitioning (dict[string, string], required)
  • mode: mode when storing data (string, default = overwrite, available values are skip_if_exists, error_if_exists, overwrite)
  • follow_location: Skip to add a partition and drop the partition if the location does not exist. (boolean, default = True)
  • catalog_id: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
  • catalog_region_name: glue data catalog region if you use a catalog different from account/region default catalog. (string, us-east-1 )
  • aws_conn_id: connection id for aws (string, default = 'aws_default')

Templates can be used in the options[db, table, location, partition_kv].

Development

Run Example

PRESTO_HOST=${YOUR PRESTO HOST} PRESTO_PORT=${YOUR PRESTO PORT} ./run-example.sh

Release

poetry publish --build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow-plugin-glue_presto_apas-0.0.11.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file airflow-plugin-glue_presto_apas-0.0.11.tar.gz.

File metadata

File hashes

Hashes for airflow-plugin-glue_presto_apas-0.0.11.tar.gz
Algorithm Hash digest
SHA256 8846a69bf63f76d4368f7de3aae12557bd6e27bb977a914464f07c4951c5e5a7
MD5 569ebebeaa15d8e6e05f2d017c51cd6b
BLAKE2b-256 73056f689e4c1e9e21b5563b6c6a976491e75a0404afb5080db3e3da8298ce58

See more details on using hashes here.

File details

Details for the file airflow_plugin_glue_presto_apas-0.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_plugin_glue_presto_apas-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 a398a67d1d973178a3cdd9ea23e057eaf68e2bf7608b95faf79f0f65b696ac9c
MD5 d9aefa8ff14ae8046016a5d826b3a7fa
BLAKE2b-256 07a5b544b0c80b229247a433e3cc68786fd5a9972c51f8a2af98112040f96b4d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page