Decorator to compile Python functions to Databricks UDFs sql statements and inline all the dependencies
Project description
uc-functions
Note: This project is in early development and may not cover all your edge cases.
The purpose of this project is to help you manage unity catalog python functions as traditional python code and be able to easily unit test, integration test and deploy them to Databricks. As part of a compilation step this package converts python AST to unity catalog functions. It also handles things like secrets, etc. by adding a layer of indirection using SQL based UDFs.
Other solutions may attempt to use packages like pickle or cloudpickle to serialize the functions. This is not recommended in
practice as it can lead to environment discrepancies. Cloudpickle works best if you are using the same python version and
same version of cloudpickle. This is hard to at the moment with serverless environments. This is also not readable and
you will see a giant base64 encoded string in your code. uc-functions
goal is to properly transpile the python code to
sql code and handle the majority of the edge cases by inlining all references in the function.
Using cloudpickle for long-term object storage is not supported and strongly discouraged.
Reference: https://github.com/cloudpipe/cloudpickle
Installation
pip install uc-functions
Goals
Convert decorated python functions to sql functions that can be deployed to Databricks. This is useful for managing large number of functions with reusable code. Easy way to test and debug functions.
In this following example code, this project will convert the python function to a SQL function. It also scans for all unidentified names, functions, etc. and tries to inline them as much as possible in the SQL functions.
import json
from pathlib import Path
from utils.keys import MY_SENSITIVE_KEYS
from uc_functions import FunctionDeployment
root_dir = str(Path(__file__).parent)
uc = FunctionDeployment("main",
"default",
root_dir,
globals_dict=globals())
@uc.register
def redact(maybe_json: str) -> str:
try:
value = json.loads(maybe_json)
for key in MY_SENSITIVE_KEYS:
if key in value:
value[key] = "REDACTED"
return json.dumps(value)
except json.JSONDecodeError:
return maybe_json
Will get converted to:
DROP FUNCTION IF EXISTS main.default.redact;
CREATE
OR
REPLACE
FUNCTION main.default.redact(maybe_json STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
import
json
MY_SENSITIVE_KEYS = ["email", "phone"]
try:
value = json.loads(maybe_json)
for key in MY_SENSITIVE_KEYS:
if key in value:
value[key] = "REDACTED"
return json.dumps(value)
except json.JSONDecodeError:
return maybe_json
$$;
Features
- Convert python functions to SQL functions
- Handle secrets
- Inline function references
- Handle imports
- Debug unidentified names
- Easy unit testing and integration testing
- Dynamic sys.path using python files in volumes (soon TBD)
Unit testing
@uc.register
is a decorator that only modifies attributes of the function. It does not modify the function
inputs and outputs themselves. This makes it easy to unit test the functions.
Example function
@uc.register
def redact(maybe_json: str) -> str:
try:
value = json.loads(maybe_json)
for key in MY_SENSITIVE_KEYS:
if key in value:
value[key] = "REDACTED"
return json.dumps(value)
except json.JSONDecodeError:
return maybe_json
Example unit test
def test_redact():
assert redact('{"email": "foo", "phone": "bar"}') == '{"email": "REDACTED", "phone": "REDACTED"}'
Integration testing
Integration testing is done by deploying the functions and it will test using the remote attribute added to the function.
Register Function:
@uc.register
def redact(maybe_json: str) -> str:
try:
value = json.loads(maybe_json)
for key in MY_SENSITIVE_KEYS:
if key in value:
value[key] = "REDACTED"
return json.dumps(value)
except json.JSONDecodeError:
return maybe_json
Once deployed run this:
# executes the code on a remote databricks warehouse
redact.remote(
'{"email": "foo", "phone": "bar"}',
# workspace_client=workspace_client, # make sure you pass the workspace client or provide environment variables
# warehouse_id=warehouse_id # optional otherwise it will pick first serverless warehouse
)
Usage
Look in examples on how to use and what the compiled output looks like in the examples
directory.
- Example code: examples/my_functions.py
- Compile Script: examples/compile.py
- Compiled SQL Stmts: examples/compile
- Deploy script: examples/deploy.py
Disclaimer
uc-functions package is not developed, endorsed not supported by Databricks. It is provided as-is; no warranty is derived from using this package. For more details, please refer to the license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file uc_functions-0.3.0.tar.gz
.
File metadata
- Download URL: uc_functions-0.3.0.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7f3761baa5ecf0493c10336db32742c4544a82a9e2e3cbc04ba517a420358c3 |
|
MD5 | d44a151f4a61aa364d8678ec93daeed7 |
|
BLAKE2b-256 | 3578faec08dee84138dce41692987b095c54a81b237d280e6df82f8a059f9bbf |
File details
Details for the file uc_functions-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: uc_functions-0.3.0-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3571de94387c3a3785e40af418b66192b5a23a1d280566b5755dfbfd8a0e5103 |
|
MD5 | d6dd31caaa2757786880c5ddd6b43d5d |
|
BLAKE2b-256 | eb99d5ad24592514719d77579a86d09e6201e73a04b9dec9338649451a791e03 |