Skip to main content

Decorator to compile Python functions to Databricks UDFs sql statements and inline all the dependencies

Project description

uc-functions

GitHub License Build codecov GitHub Tag

Note: This project is in early development and may not cover all your edge cases.

The purpose of this project is to help you manage unity catalog python functions as traditional python code and be able to easily unit test, integration test and deploy them to Databricks. As part of a compilation step this package converts python AST to unity catalog functions. It also handles things like secrets, etc. by adding a layer of indirection using SQL based UDFs.

Other solutions may attempt to use packages like pickle or cloudpickle to serialize the functions. This is not recommended in practice as it can lead to environment discrepancies. Cloudpickle works best if you are using the same python version and same version of cloudpickle. This is hard to at the moment with serverless environments. This is also not readable and you will see a giant base64 encoded string in your code. uc-functions goal is to properly transpile the python code to sql code and handle the majority of the edge cases by inlining all references in the function.

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

Reference: https://github.com/cloudpipe/cloudpickle

Installation

pip install uc-functions

Goals

Convert decorated python functions to sql functions that can be deployed to Databricks. This is useful for managing large number of functions with reusable code. Easy way to test and debug functions.

In this following example code, this project will convert the python function to a SQL function. It also scans for all unidentified names, functions, etc. and tries to inline them as much as possible in the SQL functions.

import json
from pathlib import Path
from utils.keys import MY_SENSITIVE_KEYS

from uc_functions import FunctionDeployment

root_dir = str(Path(__file__).parent)
uc = FunctionDeployment("main",
                        "default",
                        root_dir,
                        globals_dict=globals())


@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Will get converted to:

DROP FUNCTION IF EXISTS main.default.redact;

CREATE
OR
REPLACE
FUNCTION main.default.redact(maybe_json STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
import
json

MY_SENSITIVE_KEYS = ["email", "phone"]
try:
    value = json.loads(maybe_json)
    for key in MY_SENSITIVE_KEYS:
        if key in value:
            value[key] = "REDACTED"
    return json.dumps(value)
except json.JSONDecodeError:
    return maybe_json

$$;

Features

  • Convert python functions to SQL functions
  • Handle secrets
  • Inline function references
  • Handle imports
  • Debug unidentified names
  • Easy unit testing and integration testing
  • Dynamic sys.path using python files in volumes (soon TBD)

Unit testing

@uc.register is a decorator that only modifies attributes of the function. It does not modify the function inputs and outputs themselves. This makes it easy to unit test the functions.

Example function

@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Example unit test

def test_redact():
    assert redact('{"email": "foo", "phone": "bar"}') == '{"email": "REDACTED", "phone": "REDACTED"}'

Integration testing

Integration testing is done by deploying the functions and it will test using the remote attribute added to the function.

Register Function:

@uc.register
def redact(maybe_json: str) -> str:
    try:
        value = json.loads(maybe_json)
        for key in MY_SENSITIVE_KEYS:
            if key in value:
                value[key] = "REDACTED"
        return json.dumps(value)
    except json.JSONDecodeError:
        return maybe_json

Once deployed run this:

# executes the code on a remote databricks warehouse
redact.remote(
    '{"email": "foo", "phone": "bar"}',
    # workspace_client=workspace_client, # make sure you pass the workspace client or provide environment variables
    # warehouse_id=warehouse_id # optional otherwise it will pick first serverless warehouse
)

Usage

Look in examples on how to use and what the compiled output looks like in the examples directory.

Disclaimer

uc-functions package is not developed, endorsed not supported by Databricks. It is provided as-is; no warranty is derived from using this package. For more details, please refer to the license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uc_functions-0.3.0.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

uc_functions-0.3.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file uc_functions-0.3.0.tar.gz.

File metadata

  • Download URL: uc_functions-0.3.0.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for uc_functions-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f7f3761baa5ecf0493c10336db32742c4544a82a9e2e3cbc04ba517a420358c3
MD5 d44a151f4a61aa364d8678ec93daeed7
BLAKE2b-256 3578faec08dee84138dce41692987b095c54a81b237d280e6df82f8a059f9bbf

See more details on using hashes here.

File details

Details for the file uc_functions-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: uc_functions-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.8

File hashes

Hashes for uc_functions-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3571de94387c3a3785e40af418b66192b5a23a1d280566b5755dfbfd8a0e5103
MD5 d6dd31caaa2757786880c5ddd6b43d5d
BLAKE2b-256 eb99d5ad24592514719d77579a86d09e6201e73a04b9dec9338649451a791e03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page