Helper SDK for Databricks UDTF registration and Unity Catalog integration
Project description
cognite-databricks
A helper SDK for Databricks that provides Unity Catalog SQL UDTF registration utilities, Secret Manager integration, and Databricks-specific tooling for scalar UDTFs.
Latest Release: Version 0.2.1 adds SQL-native time series UDTF support with predicate pushdown hints and SQL query analyzer for extracting pushdown hints from SQL queries.
Note: This package provides Databricks-specific utilities for Unity Catalog UDTF registration and Secret Manager integration.
Overview
cognite-databricks is a Databricks-specific helper SDK that extends pygen-spark with Unity Catalog SQL registration, Secret Manager integration, and Databricks-specific utilities. It focuses on serverless-compatible scalar UDTF execution for SQL Warehouses.
Package Purpose:
- Databricks-Specific Features: Unity Catalog SQL registration, Secret Manager integration, and Databricks-specific utilities
- Uses pygen-spark for Code Generation: All UDTF code generation (both Data Model and Time Series UDTFs) is done by
pygen-sparkusing template-based generation - Generic Components: Generic utilities (
TypeConverter,CDFConnectionConfig,to_udtf_function_name) are provided bypygen-sparkand re-exported fromcognite.databricksfor backward compatibility - Notebook-Friendly API: Aligned with
cognite.pygen's notebook workflow
It provides high-level APIs for:
- UDTF Registration: Register persistent UDTFs in Unity Catalog via SQL
- Secret Manager Integration: Manage OAuth2 credentials securely
- SQL Usage: Use UDTFs directly in SQL after registration
- Notebook-Friendly API: Aligned with
cognite.pygen's notebook workflow
Features
- Unity Catalog SQL Registration: Serverless-compatible UDTFs registered via
CREATE FUNCTIONstatements - One-Line Registration: Generate and register UDTFs in a single call
- Secret Manager Integration: Automatic credential management from TOML files
- Scalar-Only Execution: Compatible with SQL Warehouses and serverless execution
- Type Safety: Full type hints and IDE support
- Generic Components: Uses template-generated UDTFs and generic utilities (
TypeConverter,CDFConnectionConfig,to_udtf_function_name) fromcognite-pygen-sparkfor generic Spark compatibility. These components are re-exported fromcognite.databricksfor backward compatibility, but the source iscognite.pygen_spark.
Installation
pip install cognite-databricks
Quick Start
Notebook-Style (Recommended)
from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
# Load client from TOML file (same pattern as pygen)
client = load_cognite_client_from_toml("config.toml")
# Generate UDTFs for a Data Model
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
generator = generate_udtf_notebook(
data_model_id,
client,
)
# Register UDTFs in Unity Catalog (SQL registration)
udtf_result = generator.register_udtfs(
secret_scope="cdf_sp_pygen_power_windturbine",
if_exists="replace",
)
print(f"Registered {udtf_result.total_count} UDTF(s)")
Low-Level API
from cognite.client.data_classes.data_modeling.ids import DataModelId
from cognite.databricks import UDTFGenerator, SecretManagerHelper
from cognite.pygen import load_cognite_client_from_toml
# Load client from TOML file
client = load_cognite_client_from_toml("config.toml")
# Create generator
generator = UDTFGenerator(
cognite_client=client,
catalog="main",
schema="cdf_models",
)
# Set up Secret Manager (one-time setup per data model)
data_model_id = DataModelId(space="sp_pygen_power", external_id="WindTurbine", version="1")
secret_scope = f"cdf_{data_model_id.space}_{data_model_id.external_id.lower()}"
generator.secret_helper.set_cdf_credentials(
scope_name=secret_scope,
project="my-project", # from config.toml
cdf_cluster="api.cognitedata.com", # from config.toml
client_id="...", # from config.toml
client_secret="...", # from config.toml
tenant_id="...", # from config.toml
)
# Register UDTFs for catalog-based use (scalar-only)
registered = generator.register_session_scoped_udtfs()
Unity Catalog UDTF Registration
Session-scoped registration is the primary mode for scalar-only UDTFs. Register functions in the current Spark session before running SQL queries.
Register All UDTFs (Recommended)
from cognite.databricks import generate_udtf_notebook
from cognite.pygen import load_cognite_client_from_toml
from cognite.client.data_classes.data_modeling.ids import DataModelId
# Load client and generate UDTFs
client = load_cognite_client_from_toml("config.toml")
data_model_id = DataModelId(space="sailboat", external_id="sailboat", version="v1")
generator = generate_udtf_notebook(
data_model_id,
client,
output_dir="/Workspace/Users/user@example.com/udtf",
)
# Install dependencies (run in separate cell first)
# %pip install cognite-sdk
# (Restart kernel after installation)
# Register all UDTFs for catalog-based use (includes time series UDTFs automatically)
registered = generator.register_session_scoped_udtfs()
# Returns: {"SmallBoat": "small_boat_udtf", "LargeBoat": "large_boat_udtf",
# "time_series_datapoints": "time_series_datapoints_udtf", ...}
# Use in SQL (always use SECRET() for credentials)
# SELECT * FROM small_boat_udtf(
# client_id => SECRET('cdf_sailboat_sailboat', 'client_id'),
# client_secret => SECRET('cdf_sailboat_sailboat', 'client_secret'),
# tenant_id => SECRET('cdf_sailboat_sailboat', 'tenant_id'),
# cdf_cluster => SECRET('cdf_sailboat_sailboat', 'cdf_cluster'),
# project => SECRET('cdf_sailboat_sailboat', 'project'),
# name => 'MyBoat',
# description => NULL
# ) LIMIT 10;
Register Single UDTF
from cognite.databricks import generate_udtf_notebook, register_udtf_from_file
# Generate UDTFs
generator = generate_udtf_notebook(data_model_id, client, ...)
# Register a single UDTF from generated file
register_udtf_from_file(
"/Workspace/Users/user@example.com/udtf/sailboat_sailboat_v1/SmallBoat_udtf.py",
function_name="small_boat_udtf"
)
Session Scope Notes
Session-scoped registration is the supported mode for scalar-only UDTFs. Functions are temporary and must be registered at the start of each notebook/job before running SQL queries.
Requirements
Note: This document uses PyPI package names for references:
-
PyPI:
cognite-pygen(repository:pygen; import:cognite.pygen) -
PyPI:
cognite-pygen-spark(repository:pygen-spark; import:cognite.pygen_spark) -
Python 3.9+
-
cognite-pygen-spark(PyPI package name; import:cognite.pygen_spark) -
cognite-sdk-python(dependency) -
databricks-sdk(dependency)
Package Structure
cognite-databricks/
├── cognite/
│ └── databricks/
│ ├── __init__.py # Exports generate_udtf_notebook, UDTFGenerator, etc.
│ ├── udtf_registry.py # UDTF registration helpers
│ ├── secret_manager.py # Secret Manager helpers
│ ├── view_generator.py # View generation and registration
│ ├── generator.py # generate_udtf_notebook helper function
│ └── utils.py # Utility functions
├── pyproject.toml
└── README.md
Core Components
generate_udtf_notebook
High-level function for notebook workflows, aligned with pygen.generate_sdk_notebook:
from cognite.databricks import generate_udtf_notebook
generator = generate_udtf_notebook(
data_model_id,
client,
catalog="main",
schema="cdf_models",
)
UDTFGenerator
Main class for orchestrating UDTF generation and registration:
from cognite.databricks import UDTFGenerator
generator = UDTFGenerator(
workspace_client=workspace_client,
cognite_client=client,
catalog="main",
schema="cdf_models",
)
Key Methods:
register_session_scoped_udtfs(): Register UDTFs for catalog-based use (scalar-only)register_udtf_from_file(): Register a single generated UDTF file in the current session
register_udtf_from_file
Standalone function for registering a single UDTF from a generated Python file for catalog-based use:
from cognite.databricks import register_udtf_from_file
register_udtf_from_file(
"/path/to/SmallBoat_udtf.py",
function_name="small_boat_udtf"
)
register_udtf_from_file
Standalone function for registering a single UDTF from a generated Python file for catalog-based use. Useful when you only need to register one UDTF or want more control over the registration process.
from cognite.databricks import register_udtf_from_file
register_udtf_from_file(
"/path/to/SmallBoat_udtf.py",
function_name="small_boat_udtf"
)
SecretManagerHelper
Helper for managing OAuth2 credentials in Databricks Secret Manager:
from cognite.databricks import SecretManagerHelper
secret_helper = SecretManagerHelper(workspace_client)
secret_helper.set_cdf_credentials(
scope_name="cdf_sp_pygen_power_windturbine",
project="my-project",
cdf_cluster="api.cognitedata.com",
client_id="...",
client_secret="...",
tenant_id="...",
)
Development
Setup
git clone <repository-url>
cd cognite-databricks
pip install -e ".[dev]"
Running Tests
pytest tests/
Related Packages
- cognite-pygen-spark (PyPI:
cognite-pygen-spark): Generic Spark UDTF code generation library that works with any Spark cluster. Provides template-based UDTF generation, type conversion utilities (TypeConverter), connection configuration (CDFConnectionConfig), and utility functions.cognite-databricksusespygen-sparkfor all code generation. - cognite-pygen (PyPI:
cognite-pygen): Base code generation library for CDF Data Models - cognite-sdk-python: Python SDK for CDF APIs
Import Paths for Generic Components
Generic components (TypeConverter, CDFConnectionConfig, to_udtf_function_name) are provided by pygen-spark and re-exported from cognite-databricks for backward compatibility:
# Preferred: Import directly from pygen-spark (source)
from cognite.pygen_spark import TypeConverter, CDFConnectionConfig, to_udtf_function_name
# Backward compatible: Still works (re-exported from pygen-spark)
from cognite.databricks import TypeConverter, CDFConnectionConfig, to_udtf_function_name
Note: These components are generic Spark utilities and work with any Spark cluster, not just Databricks. They were moved from cognite-databricks to pygen-spark to make them available for standalone Spark clusters.
Documentation
For detailed documentation, see:
- Documentation Index: Complete guide for catalog-based scalar-only UDTF registration
- Unity Catalog UDTF Registration: Session-scoped workflow and SQL usage
- Technical Plan - CDF Databricks Integration (UDTF-Based): Architecture and design details
License
[License information]
Contributing
[Contributing guidelines]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cognite_databricks-0.2.1.tar.gz.
File metadata
- Download URL: cognite_databricks-0.2.1.tar.gz
- Upload date:
- Size: 49.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae38a539c151c2667b0828027dd9d4045c92cd6b2e3823830322f15c35abd570
|
|
| MD5 |
4f8315f7d9e9465f1035ce37259c8685
|
|
| BLAKE2b-256 |
47107b0a5ec8e9d1117b16167e1f5adc85e1f8533d056aa2136aa745f568319b
|
File details
Details for the file cognite_databricks-0.2.1-py3-none-any.whl.
File metadata
- Download URL: cognite_databricks-0.2.1-py3-none-any.whl
- Upload date:
- Size: 52.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6efeb49793d3fd901da8fdb8487b58690d86197dfe70710cf9d05a5714aca9d
|
|
| MD5 |
1d7cef2409110f996637742a9bd5d65d
|
|
| BLAKE2b-256 |
0a61a1def3fcc08d22cd98303f1c15824e3783c69773e1392c95cbddcdb0bfcf
|