Skip to main content

Decorator to compile Python functions to Databricks UDFs sql statements and inline all the dependencies

Project description

uc-functions

The purpose of this project is to help you manage unity catalog python functions as traditional python code and be able to easily unit test, integration test and deploy them to Databricks. As part of a compilation step this package converts python AST to unity catalog functions. It also handles things like secrets, etc. by adding a layer of indirection using SQL based UDFs.

Installation

pip install uc-functions

Goals

Convert decorated python functions to sql functions that can be deployed to Databricks. This is useful for managing large number of functions with reusable code. Easy way to test and debug functions.

In this following example code, this project will convert the python function to a SQL function. It also scans for all unidentified names, functions, etc. and tries to inline them as much as possible in the SQL functions.

import json
from pathlib import Path
from utils.keys import MY_SENSITIVE_KEYS

from uc_functions import FunctionDeployment

root_dir = str(Path(__file__).parent)
uc = FunctionDeployment("main",
                        "default",
                        root_dir,
                        globals_dict=globals())


@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Will get converted to:

DROP FUNCTION IF EXISTS main.default.redact;

CREATE
OR
REPLACE
FUNCTION main.default.redact(maybe_json STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
import
json

MY_SENSITIVE_KEYS = ["email", "phone"]
try:
    value = json.loads(maybe_json)
    for key in MY_SENSITIVE_KEYS:
        if key in value:
            value[key] = "REDACTED"
    return json.dumps(value)
except json.JSONDecodeError:
    return maybe_json

$$;

Features

  • Convert python functions to SQL functions
  • Handle secrets
  • Inline function references
  • Handle imports
  • Debug unidentified names
  • Easy unit testing and integration testing
  • Dynamic sys.path using python files in volumes (soon TBD)

Unit testing

@uc.register is a decorator that only modifies attributes of the function. It does not modify the function inputs and outputs themselves. This makes it easy to unit test the functions.

Example function

@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Example unit test

def test_redact():
    assert redact('{"email": "foo", "phone": "bar"}') == '{"email": "REDACTED", "phone": "REDACTED"}'

Integration testing

Integration testing is done by deploying the functions and it will test using the remote attribute added to the function.

Register Function:

@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Once deployed run this:

# executes the code on a remote databricks warehouse
redact.remote(
    '{"email": "foo", "phone": "bar"}',
    # workspace_client=workspace_client, # make sure you pass the workspace client or provide environment variables
    # warehouse_id=warehouse_id # optional otherwise it will pick first serverless warehouse
)

Usage

Look in examples on how to use and what the compiled output looks like in the examples directory.

Disclaimer

uc-functions package is not developed, endorsed not supported by Databricks. It is provided as-is; no warranty is derived from using this package. For more details, please refer to the license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uc_functions-0.1.0.tar.gz (26.9 kB view details)

Uploaded Source

Built Distribution

uc_functions-0.1.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file uc_functions-0.1.0.tar.gz.

File metadata

  • Download URL: uc_functions-0.1.0.tar.gz
  • Upload date:
  • Size: 26.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for uc_functions-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b0e29e13668385fb4d66cca0caf5ce2d63c8bc17fc2e09df8cd3d93ef04a636f
MD5 267d46d9027ec563db4ce352589c37ad
BLAKE2b-256 903263b0cd2e7511c30325ff0138aa18405d81de6b956aa27d9f760e5a7873a7

See more details on using hashes here.

File details

Details for the file uc_functions-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: uc_functions-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for uc_functions-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d995a6211f1da104e61b562d924a09387f1f5cca7e4c08212760e873d6e58c24
MD5 1fe60facf75ed7dfb45be2ec365bb067
BLAKE2b-256 1f92c4da5e76460012b151712dcddd0c61cf165c335f0ac47326d1ebb3b55a28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page