Skip to main content

A SDK for syncing Databricks using Unity Catalog and Uniform

Project description

Databricks to Snowflake Table Mirroring

image image image image

This repository provides a utility to synchronize (mirror) Iceberg table metadata from Databricks Unity Catalog to Snowflake Horizon.

It automates the creation of:

  • Snowflake Catalog Integrations
  • External Iceberg Tables

Note: This library uses credential vending to access cloud storage. Snowflake External Volumes are not required.


Table of Contents

  1. Overview
  2. Snowflake Setup
  3. Databricks Setup
  4. How to Use
  5. Configuration
  6. Parameter Reference
  7. Example Usage
  8. Limitations

Overview

This utility automates the following tasks:

  • Retrieves Iceberg metadata from Unity Catalog
  • Generates Delta-based metadata tables in Databricks
  • Creates Catalog Integrations in Snowflake
  • Creates External Iceberg Tables in Snowflake

Snowflake Setup

This utility supports two usage patterns:

  • Manual: Generate DDLs for execution in Snowflake
  • Automated: Create Snowflake assets directly from Databricks

Required Snowflake permissions:

  • Create Catalog Integrations
  • Create External Iceberg Tables

Databricks Setup

Install the library:

pip install databricks_uniform_sync

Initialize the class:

from databricks_uniform_sync import DatabricksToSnowflakeMirror

d2s = DatabricksToSnowflakeMirror(
    spark_session=spark,
    dbx_workspace_url="https://dbcxyz.databricks.cloud.net",
    dbx_workspace_pat="dapi...",
    metadata_catalog="dbx_sf_mirror_catalog",
    metadata_schema="dbx_sf_mirror_schema"
)

How to Use

1. Create or Refresh Metadata Tables

d2s.create_metadata_tables()
d2s.refresh_metadata_tables(catalog="your_catalog")

These methods are idempotent and safe to rerun.
If metadata tables do not exist, refresh_metadata_tables() will create them.


2. Add Unity Catalog Discovery Tags

d2s.refresh_uc_metadata_tags()

These tags are used to determine sync eligibility. Do not remove them.


3. Create Snowflake Catalog Integrations

Dry run (SQL only):

d2s.generate_create_sf_catalog_integrations_sql(
    oauth_client_id="client-id",
    oauth_client_secret="client-secret"
)

Execute directly:

d2s.create_sf_catalog_integrations(
    sf_account_id="xyz-123",
    sf_user="svc_name",
    sf_private_key_file="rsa/rsa_key.p8",
    sf_private_key_file_pwd="your-password",
    oauth_client_id="client-id",
    oauth_client_secret="client-secret"
)

4. Create Iceberg Tables in Snowflake

Dry run:

d2s.generate_create_sf_iceberg_tables_sql()

Execute directly:

d2s.create_sf_iceberg_tables_sql(
    sf_account_id="xyz-123",
    sf_user="svc_name",
    sf_private_key_file="rsa/rsa_key.p8",
    sf_private_key_file_pwd="your-password"
)

Configuration

Custom Metadata Table Name

d2s = DatabricksToSnowflakeMirror(
    spark_session,
    dbx_workspace_url,
    dbx_workspace_pat,
    metadata_catalog,
    metadata_schema,
    metadata_table_name="custom_table_name"
)

A corresponding view will also be created with a _vw suffix.


Custom Refresh Interval

d2s.create_sf_catalog_integrations(
    ...,
    refresh_interval_seconds=120
)

Disable Auto-Refresh on Iceberg Tables

d2s.create_sf_iceberg_tables_sql(
    ...,
    auto_refresh=False
)

Parameter Reference

Databricks Parameters

Parameter Description
spark_session Active SparkSession in Databricks
dbx_workspace_url URL of your Databricks workspace
dbx_workspace_pat Personal Access Token for authentication
metadata_catalog Unity Catalog catalog to store metadata
metadata_schema Unity Catalog schema to store metadata
metadata_table_name (optional) Custom name for metadata table

Snowflake Parameters

Parameter Description
sf_account_id Snowflake account identifier
sf_user Snowflake user/service account
sf_private_key_file Path to RSA private key
sf_private_key_file_pwd Password to decrypt RSA key
oauth_client_id Databricks OAuth client ID
oauth_client_secret Databricks OAuth client secret
refresh_interval_seconds (optional) Catalog Integration refresh interval
auto_refresh (optional) Enable/disable automatic refresh on tables

Example Usage

Coming soon.
A demo notebook or script will be added to show end-to-end execution.


Limitations

  • Only supports Iceberg tables on S3
  • Deleting tables in Unity Catalog does not remove them in Snowflake
  • Only supports RSA key pair authentication (Snowflake MFA compliance)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_uniform_sync-1.1.4.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_uniform_sync-1.1.4-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file databricks_uniform_sync-1.1.4.tar.gz.

File metadata

  • Download URL: databricks_uniform_sync-1.1.4.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for databricks_uniform_sync-1.1.4.tar.gz
Algorithm Hash digest
SHA256 4732b5528d58811d44896022dc6526803899d3da29353a37bab309b472f086a7
MD5 78f24d70724c8791b990f19fbfaafd8d
BLAKE2b-256 28c69e884e6b679e57b8afb24b29a483e763baaa2e5cd8cf7c9d20b448f8cfda

See more details on using hashes here.

File details

Details for the file databricks_uniform_sync-1.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_uniform_sync-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 923d4ee1b513a18139997f065d98ead631fba5d60136d09ba70b111e99ba9a5a
MD5 b1a54fbdbe48bbfd53ad341f0c780241
BLAKE2b-256 3befeb4c428dbf36b5bb68153a009259ac1927fa9a0846bebeb6cf921a319e4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page