Skip to main content

Decorator to compile Python functions to Databricks UDFs sql statements and inline all the dependencies

Project description

uc-functions

GitHub License Build codecov GitHub Tag

The purpose of this project is to help you manage unity catalog python functions as traditional python code and be able to easily unit test, integration test and deploy them to Databricks. As part of a compilation step this package converts python AST to unity catalog functions. It also handles things like secrets, etc. by adding a layer of indirection using SQL based UDFs.

Other solutions may attempt to use packages like pickle or cloudpickle to serialize the functions. This is not recommended in practice as it can lead to environment discrepancies. Cloudpickle works best if you are using the same python version and same version of cloudpickle. This is hard to at the moment with serverless environments. This is also not readable and you will see a giant base64 encoded string in your code. uc-functions goal is to properly transpile the python code to sql code and handle the majority of the edge cases by inlining all references in the function.

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

Reference: https://github.com/cloudpipe/cloudpickle

Installation

pip install uc-functions

Goals

Convert decorated python functions to sql functions that can be deployed to Databricks. This is useful for managing large number of functions with reusable code. Easy way to test and debug functions.

In this following example code, this project will convert the python function to a SQL function. It also scans for all unidentified names, functions, etc. and tries to inline them as much as possible in the SQL functions.

import json
from pathlib import Path
from utils.keys import MY_SENSITIVE_KEYS

from uc_functions import FunctionDeployment

root_dir = str(Path(__file__).parent)
uc = FunctionDeployment("main",
                        "default",
                        root_dir,
                        globals_dict=globals())


@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Will get converted to:

DROP FUNCTION IF EXISTS main.default.redact;

CREATE
OR
REPLACE
FUNCTION main.default.redact(maybe_json STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
import
json

MY_SENSITIVE_KEYS = ["email", "phone"]
try:
    value = json.loads(maybe_json)
    for key in MY_SENSITIVE_KEYS:
        if key in value:
            value[key] = "REDACTED"
    return json.dumps(value)
except json.JSONDecodeError:
    return maybe_json

$$;

Features

  • Convert python functions to SQL functions
  • Handle secrets
  • Inline function references
  • Handle imports
  • Debug unidentified names
  • Easy unit testing and integration testing
  • Dynamic sys.path using python files in volumes (soon TBD)

Unit testing

@uc.register is a decorator that only modifies attributes of the function. It does not modify the function inputs and outputs themselves. This makes it easy to unit test the functions.

Example function

@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Example unit test

def test_redact():
    assert redact('{"email": "foo", "phone": "bar"}') == '{"email": "REDACTED", "phone": "REDACTED"}'

Integration testing

Integration testing is done by deploying the functions and it will test using the remote attribute added to the function.

Register Function:

@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Once deployed run this:

# executes the code on a remote databricks warehouse
redact.remote(
    '{"email": "foo", "phone": "bar"}',
    # workspace_client=workspace_client, # make sure you pass the workspace client or provide environment variables
    # warehouse_id=warehouse_id # optional otherwise it will pick first serverless warehouse
)

Usage

Look in examples on how to use and what the compiled output looks like in the examples directory.

Disclaimer

uc-functions package is not developed, endorsed not supported by Databricks. It is provided as-is; no warranty is derived from using this package. For more details, please refer to the license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uc_functions-0.2.0.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

uc_functions-0.2.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file uc_functions-0.2.0.tar.gz.

File metadata

  • Download URL: uc_functions-0.2.0.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for uc_functions-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f2e23387ba76a5d841d9c715f0ec0f6845db5204ec38cb0ca99afe091fe78b8a
MD5 a2c403af6b7b6d22794a4b0b63be7824
BLAKE2b-256 25251849025a16f00bd76722897c551cfd58dc6a2b21fe91b42c92c5894526e5

See more details on using hashes here.

File details

Details for the file uc_functions-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: uc_functions-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for uc_functions-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f118b82af6c9fb96dd95ae1440d7d1f33289863cecad2f8b9dbf6803f21c3e66
MD5 9df8765b7d1232fdc909eccaf117d4d2
BLAKE2b-256 d7c4457e5826a07b581fbd25497a9b56274532be110ee4e48c083ce095d75535

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page