Skip to main content

Databricks AWS Utils

Project description

Databricks AWS Utils

Databricks AWS Utils is a library to abstract Databricks integration with AWS Services

Features

  • Convert Delta Table to be consumed by AWS Athena with Schema evolution
  • Run queries against AWS RDS using AWS Secrets Manager to retrieve the connection properties and returns as Spark DataFrame

Install

pip install databricks-aws-utils

Delta Table to AWS Athena

Motivation

Currently, delta tables are only compatible with AWS Athena engine v3, however, even with the compatibility, there are some limitations regarding the schema evolution, where the schema is not fully or correctly synchronized with the AWS Glue catalog, causing problems when querying the table.

To solve this problem, we created this library to convert the delta table columns to be compatible with the AWS Glue catalog and update the table metadata, allowing the table to be queried correctly by AWS Athena.

Usage

from databricks_aws_utils.delta_table import DeltaTableUtils

...

DeltaTableUtils(spark, 'my_schema.my_table_name').to_athena_v3()

The to_athena_v3 function uses the spark session to capture the current delta schema and update the glue table.

NOTE: This feature is only compatible with AWS Athena engine v3, and the Databricks cluster must have access to the AWS Glue catalog.

NOTE: This feature is not supported by Databricks Unity Catalog, since it does not allow queries from AWS Athena.

Custom IAM Role

If you need to use a custom IAM Role to update the AWS Glue table, you can pass the role name as a parameter to the DeltaTableUtils class.

from databricks_aws_utils.delta_table import DeltaTableUtils

...

DeltaTableUtils(
    spark,
    'my_schema.my_table_name',
    iam_role='my_custom_iam_role'
).to_athena_v3()

NOTE: The Databricks cluster must have permission to assume the custom IAM Role.

Athena Engine v2

AWS Athena engine v2 doesn't support delta tables, so, to query a delta table using AWS Athena engine v2, it's necessary to generate Hive Symlink from the delta table and point to a different table.

from databricks_aws_utils.delta_table import DeltaTableUtils

...

DeltaTableUtils(spark, 'my_schema.my_table_name').to_athena('my_schema', 'my_symlink_table_name')

NOTE: The schema name provided in the to_athena doesn't need to be the same as the delta table schema.

Contributing

Change Log

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_aws_utils-1.6.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_aws_utils-1.6.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file databricks_aws_utils-1.6.0.tar.gz.

File metadata

  • Download URL: databricks_aws_utils-1.6.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for databricks_aws_utils-1.6.0.tar.gz
Algorithm Hash digest
SHA256 0581eb9b0fcd58e725d905d8716765491fd6d47afce69feff22dcbbcdf403ecb
MD5 c6dab6b787aeb3c8e2cf7b8977a04187
BLAKE2b-256 384a22f7c1d084f661f626f8fd9778b63b43878a24cb9758826ed96dd9bbf47d

See more details on using hashes here.

File details

Details for the file databricks_aws_utils-1.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_aws_utils-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03bb36c239055abc1316bb9b0717effa1cea1ae21cd87dec1a406f06c3e41e3c
MD5 23a9a5e7644dcc7a26630c04773eded9
BLAKE2b-256 f66b2592992f445673323c5762aa754b5a998a121eef462836e5beb68d4471e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page