Skip to main content

A SDK for syncing Databricks using Unity Catalog and Uniform

Project description

Databricks to Snowflake Table Mirroring

image image image image

This repository provides a utility to synchronize (mirror) Iceberg table metadata from Databricks Unity Catalog to Snowflake Horizon.

It automates the creation of:

  • Snowflake Catalog Integrations
  • External Iceberg Tables

Note: This library uses credential vending to access cloud storage. Snowflake External Volumes are not required.


Table of Contents

  1. Overview
  2. Snowflake Setup
  3. Databricks Setup
  4. How to Use
  5. Configuration
  6. Parameter Reference
  7. Example Usage
  8. Limitations

Overview

This utility automates the following tasks:

  • Retrieves Iceberg metadata from Unity Catalog
  • Generates Delta-based metadata tables in Databricks
  • Creates Catalog Integrations in Snowflake
  • Creates External Iceberg Tables in Snowflake

Snowflake Setup

This utility supports two usage patterns:

  • Manual: Generate DDLs for execution in Snowflake
  • Automated: Create Snowflake assets directly from Databricks

Required Snowflake permissions:

  • Create Catalog Integrations
  • Create External Iceberg Tables

Databricks Setup

Install the library:

pip install databricks_uniform_sync

Initialize the class:

from databricks_uniform_sync import DatabricksToSnowflakeMirror

d2s = DatabricksToSnowflakeMirror(
    spark_session=spark,
    dbx_workspace_url="https://dbcxyz.databricks.cloud.net",
    dbx_workspace_pat="dapi...",
    metadata_catalog="dbx_sf_mirror_catalog",
    metadata_schema="dbx_sf_mirror_schema"
)

How to Use

1. Create or Refresh Metadata Tables

d2s.create_metadata_tables()
d2s.refresh_metadata_tables(catalog="your_catalog")

These methods are idempotent and safe to rerun.
If metadata tables do not exist, refresh_metadata_tables() will create them.


2. Add Unity Catalog Discovery Tags

d2s.refresh_uc_metadata_tags()

These tags are used to determine sync eligibility. Do not remove them.


3. Create Snowflake Catalog Integrations

Dry run (SQL only):

d2s.generate_create_sf_catalog_integrations_sql(
    oauth_client_id="client-id",
    oauth_client_secret="client-secret"
)

Execute directly:

d2s.create_sf_catalog_integrations(
    sf_account_id="xyz-123",
    sf_user="svc_name",
    sf_private_key_file="rsa/rsa_key.p8",
    sf_private_key_file_pwd="your-password",
    oauth_client_id="client-id",
    oauth_client_secret="client-secret"
)

4. Create Iceberg Tables in Snowflake

Dry run:

d2s.generate_create_sf_iceberg_tables_sql()

Execute directly:

d2s.create_sf_iceberg_tables_sql(
    sf_account_id="xyz-123",
    sf_user="svc_name",
    sf_private_key_file="rsa/rsa_key.p8",
    sf_private_key_file_pwd="your-password"
)

Configuration

Custom Metadata Table Name

d2s = DatabricksToSnowflakeMirror(
    spark_session,
    dbx_workspace_url,
    dbx_workspace_pat,
    metadata_catalog,
    metadata_schema,
    metadata_table_name="custom_table_name"
)

A corresponding view will also be created with a _vw suffix.


Custom Refresh Interval

d2s.create_sf_catalog_integrations(
    ...,
    refresh_interval_seconds=120
)

Disable Auto-Refresh on Iceberg Tables

d2s.create_sf_iceberg_tables_sql(
    ...,
    auto_refresh=False
)

Parameter Reference

Databricks Parameters

Parameter Description
spark_session Active SparkSession in Databricks
dbx_workspace_url URL of your Databricks workspace
dbx_workspace_pat Personal Access Token for authentication
metadata_catalog Unity Catalog catalog to store metadata
metadata_schema Unity Catalog schema to store metadata
metadata_table_name (optional) Custom name for metadata table

Snowflake Parameters

Parameter Description
sf_account_id Snowflake account identifier
sf_user Snowflake user/service account
sf_private_key_file Path to RSA private key
sf_private_key_file_pwd Password to decrypt RSA key
oauth_client_id Databricks OAuth client ID
oauth_client_secret Databricks OAuth client secret
refresh_interval_seconds (optional) Catalog Integration refresh interval
auto_refresh (optional) Enable/disable automatic refresh on tables

Example Usage

Coming soon.
A demo notebook or script will be added to show end-to-end execution.


Limitations

  • Only supports Iceberg tables on S3
  • Deleting tables in Unity Catalog does not remove them in Snowflake
  • Only supports RSA key pair authentication (Snowflake MFA compliance)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_uniform_sync-1.1.2.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

databricks_uniform_sync-1.1.2-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file databricks_uniform_sync-1.1.2.tar.gz.

File metadata

  • Download URL: databricks_uniform_sync-1.1.2.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for databricks_uniform_sync-1.1.2.tar.gz
Algorithm Hash digest
SHA256 a6a48bbd946c8f45728be0bbd6969529f40fcf61d1488b503abbb03046e9ca1d
MD5 2dbe86eb9af2fa0c100cad7930d737c4
BLAKE2b-256 5e71c76d245bddeec5658933d4b9d5a67e9f7aa7b18515cc837f6d5636d82616

See more details on using hashes here.

File details

Details for the file databricks_uniform_sync-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for databricks_uniform_sync-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 75098f0943b3af6338b4766fa12507baf6ae291f9a853bb8efa4c31be85ae8e2
MD5 c687a66c727c81fca8fc03be7c91bdb5
BLAKE2b-256 15a884c87f7b2c7212738f925215fbc104d7426dcc1f30ea91918e17fc865678

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page