Skip to main content

Helper SDK for Databricks UDTF registration and Unity Catalog integration

Project description

cognite-databricks

A helper SDK for Databricks that provides UDTF registration utilities, Secret Manager integration, and Unity Catalog View generation for CDF Data Models.

Note: This is the initial release (0.1.0) of cognite-databricks.

Overview

cognite-databricks is a Databricks-specific helper SDK that extends pygen-spark with Unity Catalog registration, Secret Manager integration, and Databricks-specific utilities. It simplifies the process of registering CDF Data Models as discoverable Unity Catalog Views in Databricks.

Package Purpose:

  • Databricks-Specific Features: Unity Catalog registration, Secret Manager integration, and Databricks-specific utilities
  • Uses pygen-spark for Code Generation: All UDTF code generation (both Data Model and Time Series UDTFs) is done by pygen-spark using template-based generation
  • Generic Components: Generic utilities (TypeConverter, CDFConnectionConfig, to_udtf_function_name) are provided by pygen-spark and re-exported from cognite.databricks for backward compatibility
  • Notebook-Friendly API: Aligned with cognite.pygen's notebook workflow

It provides high-level APIs for:

  • UDTF Registration: Register Python UDTFs in Unity Catalog
  • Secret Manager Integration: Manage OAuth2 credentials securely
  • View Generation: Create Unity Catalog Views with Secret injection
  • Notebook-Friendly API: Aligned with cognite.pygen's notebook workflow

Features

  • Two Registration Modes:
    • Unity Catalog: Permanent, discoverable UDTFs with governance (production)
    • Session-Scoped: Quick testing and development without Unity Catalog (development)
  • One-Line Registration: Generate and register UDTFs and Views with a single function call
  • Secret Manager Integration: Automatic credential management from TOML files
  • Unity Catalog Integration: Native support for Unity Catalog function and view registration
  • DBR 18.1+ Support: Custom dependency support for UDTFs
  • Type Safety: Full type hints and IDE support
  • Generic Components: Uses template-generated UDTFs and generic utilities (TypeConverter, CDFConnectionConfig, to_udtf_function_name) from cognite-pygen-spark for generic Spark compatibility. These components are re-exported from cognite.databricks for backward compatibility, but the source is cognite.pygen_spark.

Installation

pip install cognite-databricks

Quick Start

Notebook-Style (Recommended)

from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml

# Load client from TOML file (same pattern as pygen)
client = load_cognite_client_from_toml("config.toml")

# Generate UDTFs for a Data Model
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
generator = generate_udtf_notebook(
    data_model_id,
    client,
)

# Register UDTFs and Views in Unity Catalog
# Includes both data model UDTFs and time series UDTFs (all template-generated)
# Scope name auto-generated from data model: cdf_{space}_{external_id}
result = generator.register_udtfs_and_views(
    secret_scope=None,  # Auto-generated if None
    dependencies=["cognite-sdk>=6.0.0"],
)
print(f"Registered {result.total_count} UDTF(s) including time series UDTFs")

Low-Level API

from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import UDTFGenerator, SecretManagerHelper
from cognite.pygen import load_cognite_client_from_toml
from databricks.sdk import WorkspaceClient

# Load client from TOML file
client = load_cognite_client_from_toml("config.toml")
workspace_client = WorkspaceClient()

# Create generator
generator = UDTFGenerator(
    workspace_client=workspace_client,
    cognite_client=client,
    catalog="main",
    schema="cdf_models",
)

# Set up Secret Manager (one-time setup per data model)
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
secret_scope = f"cdf_{data_model_id.space}_{data_model_id.external_id.lower()}"

generator.secret_helper.set_cdf_credentials(
    scope_name=secret_scope,
    project="my-project",  # from config.toml
    cdf_cluster="api.cognitedata.com",  # from config.toml
    client_id="...",  # from config.toml
    client_secret="...",  # from config.toml
    tenant_id="...",  # from config.toml
)

# Register UDTFs and Views
# For DBR 18.1+: dependencies are bundled with UDTF
# For pre-DBR 18.1: set dependencies=None and pre-install packages on cluster
registered = generator.register_udtfs_and_views(
    data_model=data_model_id,
    secret_scope=secret_scope,
    dependencies=["cognite-sdk>=6.0.0"],  # DBR 18.1+ only
)

Session-Scoped UDTF Registration

For development, testing, or DBR < 18.1 environments, you can register UDTFs for session-scoped use without Unity Catalog registration. This allows you to test UDTFs quickly using %pip install cognite-sdk in a notebook.

Register All UDTFs (Recommended)

from databricks.sdk import WorkspaceClient
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
from cognite.client.data_classes.data_modeling.ids import DataModelId

# Create WorkspaceClient (auto-detects credentials in Databricks)
workspace_client = WorkspaceClient()

# Load client and generate UDTFs
client = load_cognite_client_from_toml("config.toml")
data_model_id = DataModelId(space="sailboat", external_id="sailboat", version="v1")
generator = generate_udtf_notebook(
    data_model_id,
    client,
    workspace_client=workspace_client,  # Include this for full functionality
    output_dir="/Workspace/Users/user@example.com/udtf",
    # Note: catalog and schema parameters are only used for Unity Catalog registration,
    # not for session-scoped UDTFs. They can be omitted for session-scoped use.
)

# Install dependencies (run in separate cell first)
# %pip install cognite-sdk
# (Restart kernel after installation)

# Register all UDTFs for session-scoped use (includes time series UDTFs automatically)
registered = generator.register_session_scoped_udtfs()
# Returns: {"SmallBoat": "small_boat_udtf", "LargeBoat": "large_boat_udtf", 
#           "time_series_datapoints": "time_series_datapoints_udtf", ...}

# Use in SQL (always use SECRET() for credentials)
# SELECT * FROM small_boat_udtf(
#     client_id => SECRET('cdf_sailboat_sailboat', 'client_id'),
#     client_secret => SECRET('cdf_sailboat_sailboat', 'client_secret'),
#     tenant_id => SECRET('cdf_sailboat_sailboat', 'tenant_id'),
#     cdf_cluster => SECRET('cdf_sailboat_sailboat', 'cdf_cluster'),
#     project => SECRET('cdf_sailboat_sailboat', 'project'),
#     name => 'MyBoat',
#     description => NULL
# ) LIMIT 10;

Register Single UDTF

from cognite.databricks import generate_udtf_notebook, register_udtf_from_file

# Generate UDTFs
generator = generate_udtf_notebook(data_model_id, client, ...)

# Register a single UDTF from generated file
register_udtf_from_file(
    "/Workspace/Users/user@example.com/udtf/sailboat_sailboat_v1/SmallBoat_udtf.py",
    function_name="small_boat_udtf"
)

When to Use Session-Scoped vs Unity Catalog

Feature Session-Scoped Unity Catalog
Registration Per-session, temporary Permanent, catalog-wide
Dependencies Install via %pip Bundled (DBR 18.1+) or pre-installed
Use Case Development, testing, prototyping Production, governance, discovery
DBR Version All versions All versions (with limitations)
Searchable No Yes (via Databricks Search)
Permissions Session-level Unity Catalog permissions

Recommendation: Use session-scoped registration for development and testing, then register in Unity Catalog for production use.

Requirements

Note: This document uses PyPI package names for references:

  • PyPI: cognite-pygen (repository: pygen; import: cognite.pygen)

  • PyPI: cognite-pygen-spark (repository: pygen-spark; import: cognite.pygen_spark)

  • Python 3.9+

  • cognite-pygen-spark (PyPI package name; import: cognite.pygen_spark)

  • cognite-sdk-python (dependency)

  • databricks-sdk (dependency)

  • Databricks Runtime 18.1+ (REQUIRED for register_udtfs_and_views())

Package Structure

cognite-databricks/
├── cognite/
│   └── databricks/
│       ├── __init__.py            # Exports generate_udtf_notebook, UDTFGenerator, etc.
│       ├── udtf_registry.py        # UDTF registration in Unity Catalog
│       ├── secret_manager.py      # Secret Manager helpers
│       ├── view_generator.py       # View generation and registration
│       ├── generator.py            # generate_udtf_notebook helper function
│       └── utils.py                # Utility functions
├── pyproject.toml
└── README.md

Core Components

generate_udtf_notebook

High-level function for notebook workflows, aligned with pygen.generate_sdk_notebook:

from cognite.databricks import generate_udtf_notebook

generator = generate_udtf_notebook(
    data_model_id,
    client,
    catalog="main",
    schema="cdf_models",
)

UDTFGenerator

Main class for orchestrating UDTF generation and registration:

from cognite.databricks import UDTFGenerator

generator = UDTFGenerator(
    workspace_client=workspace_client,
    cognite_client=client,
    catalog="main",
    schema="cdf_models",
)

Key Methods:

  • register_udtfs_and_views(): Register all UDTFs and Views in Unity Catalog (production)
  • register_session_scoped_udtfs(): Register UDTFs for session-scoped use (development/testing)

register_udtf_from_file

Standalone function for registering a single UDTF from a generated Python file for session-scoped use:

from cognite.databricks import register_udtf_from_file

register_udtf_from_file(
    "/path/to/SmallBoat_udtf.py",
    function_name="small_boat_udtf"
)

UDTFRegistry

Utility for registering Python UDTFs in Unity Catalog:

from cognite.databricks import UDTFRegistry

registry = UDTFRegistry(workspace_client)
function_info = registry.register_udtf(
    catalog="main",
    schema="cdf",
    function_name="pump_view_udtf",
    udtf_code=udtf_code,
    input_params=[...],
    return_type="TABLE(...)",
    dependencies=["cognite-sdk>=6.0.0"],  # DBR 18.1+
)

register_udtf_from_file

Standalone function for registering a single UDTF from a generated Python file for session-scoped use. Useful when you only need to register one UDTF or want more control over the registration process.

from cognite.databricks import register_udtf_from_file

register_udtf_from_file(
    "/path/to/SmallBoat_udtf.py",
    function_name="small_boat_udtf"
)

SecretManagerHelper

Helper for managing OAuth2 credentials in Databricks Secret Manager:

from cognite.databricks import SecretManagerHelper

secret_helper = SecretManagerHelper(workspace_client)
secret_helper.set_cdf_credentials(
    scope_name="cdf_sp_pygen_power_windturbine",
    project="my-project",
    cdf_cluster="api.cognitedata.com",
    client_id="...",
    client_secret="...",
    tenant_id="...",
)

Development

Setup

git clone <repository-url>
cd cognite-databricks
pip install -e ".[dev]"

Running Tests

pytest tests/

Pre-DBR 18.1 Usage

For Databricks Runtime versions prior to 18.1, register_udtfs_and_views() is not supported. Use register_session_scoped_udtfs() instead, which works on all DBR versions:

Step 1: Install Packages on Cluster

# In a Databricks notebook cell
%pip install cognite-sdk>=6.0.0 cognite-pygen-spark>=0.1.0

Or configure cluster libraries via the Databricks UI.

Step 2: Register UDTFs for Session-Scoped Use

from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml

client = load_cognite_client_from_toml("config.toml")
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
generator = generate_udtf_notebook(data_model_id, client)

# Use register_session_scoped_udtfs() for pre-DBR 18.1
registered = generator.register_session_scoped_udtfs()
# Returns: {"WindTurbine": "wind_turbine_udtf", ...}

Note: Session-scoped UDTFs are temporary and only available in the current notebook session. For production use with Unity Catalog views, upgrade to DBR 18.1+ and use register_udtfs_and_views().

Related Packages

  • cognite-pygen-spark (PyPI: cognite-pygen-spark): Generic Spark UDTF code generation library that works with any Spark cluster. Provides template-based UDTF generation, type conversion utilities (TypeConverter), connection configuration (CDFConnectionConfig), and utility functions. cognite-databricks uses pygen-spark for all code generation.
  • cognite-pygen (PyPI: cognite-pygen): Base code generation library for CDF Data Models
  • cognite-sdk-python: Python SDK for CDF APIs

Import Paths for Generic Components

Generic components (TypeConverter, CDFConnectionConfig, to_udtf_function_name) are provided by pygen-spark and re-exported from cognite-databricks for backward compatibility:

# Preferred: Import directly from pygen-spark (source)
from cognite.pygen_spark import TypeConverter, CDFConnectionConfig, to_udtf_function_name

# Backward compatible: Still works (re-exported from pygen-spark)
from cognite.databricks import TypeConverter, CDFConnectionConfig, to_udtf_function_name

Note: These components are generic Spark utilities and work with any Spark cluster, not just Databricks. They were moved from cognite-databricks to pygen-spark to make them available for standalone Spark clusters.

Documentation

For detailed documentation, see:

License

[License information]

Contributing

[Contributing guidelines]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognite_databricks-0.1.0.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cognite_databricks-0.1.0-py3-none-any.whl (40.2 kB view details)

Uploaded Python 3

File details

Details for the file cognite_databricks-0.1.0.tar.gz.

File metadata

  • Download URL: cognite_databricks-0.1.0.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for cognite_databricks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d61ba278b82d7b6c06a157e28f6ae607d702cbca75a035ede21189a8327afb04
MD5 a4a7a0505db02c69cd2158ab68ae00d8
BLAKE2b-256 0bd5db05ec02e426aaa9c9bf59e77bbb34725ea2e5653b5dc0f6a3923eaa2402

See more details on using hashes here.

File details

Details for the file cognite_databricks-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cognite_databricks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ddf81b8e2968e1a8ba745570a45aaf121767f67d8d6297ad873e0b7cb61d7ef
MD5 b9e308ff80600b38ac2e95bbfc449b2f
BLAKE2b-256 6e2fe2631c5cabd536e574b34d49981dc261dff6cf098b26fe8da99014b1692b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page