Skip to main content

Helper SDK for Databricks UDTF registration and Unity Catalog integration

Project description

cognite-databricks

A helper SDK for Databricks that provides Unity Catalog SQL UDTF registration utilities, Secret Manager integration, and Databricks-specific tooling for scalar UDTFs.

Latest Release:

  • Version 0.2.3: Aligns Unity Catalog view registration with cognite-pygen-spark 0.2.3+ for reserved-word safe UDTFField naming.
  • Dependencies: Now target cognite-pygen-spark 0.3.0+ for:
    • CDF audit headers in generated UDTFs.
    • TypeConverter-based UDTF field typing.
    • Timestamp normalization.
  • Version 0.2.1: Added SQL-native time series UDTF support with predicate pushdown hints and a SQL query analyzer for pushdown hints.

Full release notes are published on GitHub Releases.

Note: This package provides Databricks-specific utilities for Unity Catalog UDTF registration and Secret Manager integration.

Overview

cognite-databricks is a Databricks-specific helper SDK that extends pygen-spark with Unity Catalog SQL registration, Secret Manager integration, and Databricks-specific utilities. It focuses on serverless-compatible scalar UDTF execution for SQL Warehouses.

Package Purpose:

  • Databricks-Specific Features: Unity Catalog SQL registration, Secret Manager integration, and Databricks-specific utilities
  • Uses pygen-spark for Code Generation: All UDTF code generation (both Data Model and Time Series UDTFs) is done by pygen-spark using template-based generation
  • Generic Components: Generic utilities (TypeConverter, CDFConnectionConfig, to_udtf_function_name) are provided by pygen-spark and re-exported from cognite.databricks for backward compatibility
  • Notebook-Friendly API: Aligned with cognite.pygen's notebook workflow

It provides high-level APIs for:

  • UDTF Registration: Register persistent UDTFs in Unity Catalog via SQL
  • Secret Manager Integration: Manage OAuth2 credentials securely
  • SQL Usage: Use UDTFs directly in SQL after registration

Features

  • Unity Catalog SQL Registration: Serverless-compatible UDTFs registered via CREATE FUNCTION statements
  • One-Line Registration: Generate and register UDTFs in a single call
  • Secret Manager Integration: Automatic credential management from TOML files
  • Scalar-Only Execution: Compatible with SQL Warehouses and serverless execution
  • Type Safety: Full type hints and IDE support
  • Generic Components: Uses template-generated UDTFs and generic utilities (TypeConverter, CDFConnectionConfig, to_udtf_function_name) from cognite-pygen-spark for generic Spark compatibility. These components are re-exported from cognite.databricks for backward compatibility, but the source is cognite.pygen_spark.

Installation

pip install cognite-databricks

Start here (recommended)

Use the catalog-based quickstart as the main path for getting productive: it works for all customers (Unity Catalog + Secret Manager + your CDF data model).

Resource What it is
docs/catalog_based/quickstart.md Step-by-step guide with explained code blocks
quickstart.ipynb Notebook version — same flow with markdown introductions and inline comments in every code cell

Prerequisites: docs/catalog_based/prerequisites.md (Unity Catalog, Secret Manager, CDF TOML).

Full documentation index: docs/index.md.

The quickstart walks through: installCDF + Databricks clientspick a SQL warehousegenerate_udtf_notebookwrite CDF secrets to Secret Managerregister_udtfsregister_views.

Quick start (short reference)

Catalog-based registration (Unity Catalog) in a few lines — for the full narrative, use the quickstart links above.

from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
from databricks.sdk import WorkspaceClient

client = load_cognite_client_from_toml("/Workspace/Users/<you>/config.toml")
workspace_client = WorkspaceClient()
warehouses = list(workspace_client.warehouses.list())
warehouse = warehouses[0]  # pick the warehouse you use for SQL

data_model_id = DataModelId(space="my_space", external_id="MyModel", version="v1")
generator = generate_udtf_notebook(
    data_model_id,
    client,
    workspace_client=workspace_client,
    output_dir="/Workspace/Users/<you>/udtf_generated",
    catalog="my_catalog",
    schema="my_schema",
    warehouse_id=warehouse.id,
)

# Then: secret scope + set_cdf_credentials + register_udtfs + register_views
# (see quickstart — do not skip warehouse selection or Secret Manager)

Low-Level API

from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import UDTFGenerator, SecretManagerHelper
from cognite.pygen import load_cognite_client_from_toml

# Load client from TOML file
client = load_cognite_client_from_toml("config.toml")

# Create generator
generator = UDTFGenerator(
    cognite_client=client,
    catalog="main",
    schema="cdf_models",
)

# Set up Secret Manager (one-time setup per data model)
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
secret_scope = f"cdf_{data_model_id.space}_{data_model_id.external_id.lower()}"

generator.secret_helper.set_cdf_credentials(
    scope_name=secret_scope,
    project="my-project",  # from config.toml
    cdf_cluster="api.cognitedata.com",  # from config.toml
    client_id="...",  # from config.toml
    client_secret="...",  # from config.toml
    tenant_id="...",  # from config.toml
)

# Register UDTFs for catalog-based use (scalar-only)
registered = generator.register_session_scoped_udtfs()

Unity Catalog UDTF Registration

Session-scoped registration is the primary mode for scalar-only UDTFs. Register functions in the current Spark session before running SQL queries.

Register All UDTFs (Recommended)

from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
from cognite.client.data_classes.data_modeling.ids import DataModelId

# Load client and generate UDTFs
client = load_cognite_client_from_toml("config.toml")
data_model_id = DataModelId(space="sailboat", external_id="sailboat", version="v1")
generator = generate_udtf_notebook(
    data_model_id,
    client,
    output_dir="/Workspace/Users/user@example.com/udtf",
)

# Install dependencies (run in separate cell first)
# %pip install cognite-sdk
# (Restart kernel after installation)

# Register all UDTFs for catalog-based use (includes time series UDTFs automatically)
registered = generator.register_session_scoped_udtfs()
# Returns: {"SmallBoat": "small_boat_udtf", "LargeBoat": "large_boat_udtf", 
#           "time_series_datapoints": "time_series_datapoints_udtf", ...}

# Use in SQL (always use SECRET() for credentials)
# SELECT * FROM small_boat_udtf(
#     client_id => SECRET('cdf_sailboat_sailboat', 'client_id'),
#     client_secret => SECRET('cdf_sailboat_sailboat', 'client_secret'),
#     tenant_id => SECRET('cdf_sailboat_sailboat', 'tenant_id'),
#     cdf_cluster => SECRET('cdf_sailboat_sailboat', 'cdf_cluster'),
#     project => SECRET('cdf_sailboat_sailboat', 'project'),
#     name => 'MyBoat',
#     description => NULL
# ) LIMIT 10;

Register Single UDTF

from cognite.databricks import generate_udtf_notebook, register_udtf_from_file

# Generate UDTFs
generator = generate_udtf_notebook(data_model_id, client, ...)

# Register a single UDTF from generated file
register_udtf_from_file(
    "/Workspace/Users/user@example.com/udtf/sailboat_sailboat_v1/SmallBoat_udtf.py",
    function_name="small_boat_udtf"
)

Session Scope Notes

Session-scoped registration is the supported mode for scalar-only UDTFs. Functions are temporary and must be registered at the start of each notebook/job before running SQL queries.

Requirements

Note: This document uses PyPI package names for references:

  • PyPI: cognite-pygen (repository: pygen; import: cognite.pygen)

  • PyPI: cognite-pygen-spark (repository: pygen-spark; import: cognite.pygen_spark)

  • Python 3.9+

  • cognite-pygen-spark (PyPI package name; import: cognite.pygen_spark)

  • cognite-sdk-python (dependency)

  • databricks-sdk (dependency)

Package Structure

cognite-databricks/
├── cognite/
│   └── databricks/
│       ├── __init__.py            # Exports generate_udtf_notebook, UDTFGenerator, etc.
│       ├── udtf_registry.py        # UDTF registration helpers
│       ├── secret_manager.py      # Secret Manager helpers
│       ├── view_generator.py       # View generation and registration
│       ├── generator.py            # generate_udtf_notebook helper function
│       └── utils.py                # Utility functions
├── pyproject.toml
└── README.md

Core Components

generate_udtf_notebook

High-level function for notebook workflows, aligned with pygen.generate_sdk_notebook:

from cognite.databricks import generate_udtf_notebook

generator = generate_udtf_notebook(
    data_model_id,
    client,
    catalog="main",
    schema="cdf_models",
)

UDTFGenerator

Main class for orchestrating UDTF generation and registration:

from cognite.databricks import UDTFGenerator

generator = UDTFGenerator(
    workspace_client=workspace_client,
    cognite_client=client,
    catalog="main",
    schema="cdf_models",
)

Key Methods:

  • register_session_scoped_udtfs(): Register UDTFs for catalog-based use (scalar-only)
  • register_udtf_from_file(): Register a single generated UDTF file in the current session

register_udtf_from_file

Standalone function for registering a single UDTF from a generated Python file for catalog-based use:

from cognite.databricks import register_udtf_from_file

register_udtf_from_file(
    "/path/to/SmallBoat_udtf.py",
    function_name="small_boat_udtf"
)

register_udtf_from_file

Standalone function for registering a single UDTF from a generated Python file for catalog-based use. Useful when you only need to register one UDTF or want more control over the registration process.

from cognite.databricks import register_udtf_from_file

register_udtf_from_file(
    "/path/to/SmallBoat_udtf.py",
    function_name="small_boat_udtf"
)

SecretManagerHelper

Helper for managing OAuth2 credentials in Databricks Secret Manager:

from cognite.databricks import SecretManagerHelper

secret_helper = SecretManagerHelper(workspace_client)
secret_helper.set_cdf_credentials(
    scope_name="cdf_sp_pygen_power_windturbine",
    project="my-project",
    cdf_cluster="api.cognitedata.com",
    client_id="...",
    client_secret="...",
    tenant_id="...",
)

Development

Setup

git clone <repository-url>
cd cognite-databricks
pip install -e ".[dev]"

Running Tests

pytest tests/

Related Packages

  • cognite-pygen-spark (PyPI: cognite-pygen-spark): Generic Spark UDTF code generation library that works with any Spark cluster. Provides template-based UDTF generation, type conversion utilities (TypeConverter), connection configuration (CDFConnectionConfig), and utility functions. cognite-databricks uses pygen-spark for all code generation.
  • cognite-pygen (PyPI: cognite-pygen): Base code generation library for CDF Data Models
  • cognite-sdk-python: Python SDK for CDF APIs

Import Paths for Generic Components

Generic components (TypeConverter, CDFConnectionConfig, to_udtf_function_name) are provided by pygen-spark and re-exported from cognite-databricks for backward compatibility:

# Preferred: Import directly from pygen-spark (source)
from cognite.pygen_spark import TypeConverter, CDFConnectionConfig, to_udtf_function_name

# Backward compatible: Still works (re-exported from pygen-spark)
from cognite.databricks import TypeConverter, CDFConnectionConfig, to_udtf_function_name

Note: These components are generic Spark utilities and work with any Spark cluster, not just Databricks. They were moved from cognite-databricks to pygen-spark to make them available for standalone Spark clusters.

Documentation

License

[License information]

Contributing

[Contributing guidelines]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognite_databricks-0.3.0.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cognite_databricks-0.3.0-py3-none-any.whl (53.6 kB view details)

Uploaded Python 3

File details

Details for the file cognite_databricks-0.3.0.tar.gz.

File metadata

  • Download URL: cognite_databricks-0.3.0.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for cognite_databricks-0.3.0.tar.gz
Algorithm Hash digest
SHA256 eb525cfd7b2d6b16233d00dd1dc1a80df7bae636020fec27302a53efe5f2b058
MD5 70807718b1c6fcbcb097c31f0c2a8ebc
BLAKE2b-256 72b6732271140e8d30d13702b608ff2dbfc24865ada2204786eba4799024b0b9

See more details on using hashes here.

File details

Details for the file cognite_databricks-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cognite_databricks-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9972e82f5972063986370327412403a5452bdaf0960422eb013c46117e16478e
MD5 2f02e02e0afa0e72210f901078b315e8
BLAKE2b-256 da02b22507e26c8a5a02dcb909a51196729d2ea038bebbb0b723e90dce8b5709

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page