Official Python library for Unity Catalog AI support
Project description
Unity Catalog AI Core library
The Unity Catalog AI Core library provides convenient APIs to interact with Unity Catalog functions, including the creation, retrieval and execution of functions. The library includes clients for interacting with both the Open-Source Unity Catalog server and the Databricks-managed Unity Catalog service, in support of UC functions as tools in agents.
Installation
# install for OSS Unity Catalog
pip install unitycatalog-ai[oss]
# install for Databricks Unity Catalog
pip install unitycatalog-ai[databricks]
Get started
Unity Catalog Open Source
The Open-Source Unity Catalog (OSS UC) client is a core component of the Unity Catalog AI Core Library, enabling seamless interaction with the open-source version of Unity Catalog. This client allows you to manage and execute UC functions, providing both asynchronous and synchronous interfaces to cater to various application needs. Whether you're integrating UC functions into GenAI workflows or managing them directly, the OSS UC client offers robust and flexible APIs to facilitate your development process.
Caveats
When using the UnitycatalogFunctionClient for OSS UC, be mindful of the following considerations:
- Asynchronous API Usage:
- The
UnitycatalogFunctionClientis built on top of the asynchronous unitycatalog-client SDK, which utilizes aiohttp. - The function client for OSS Unity Catalog offers both asynchronous and synchronous methods. The synchronous methods are wrappers around the asynchronous counterparts, ensuring compatibility with environments that may not support asynchronous operations.
- Important: Avoid creating additional event loops in environments that already have a running loop (e.g., Jupyter Notebooks) to prevent conflicts and potential runtime errors.
- The
- Security Considerations:
- WARNING Function execution occurs locally within the environment where your application is running.
- Caution: Executing GenAI-generated Python code can pose security risks, especially if the code includes operations like file system access or network requests.
- Recommendation: Run your application in an isolated and secure environment with restricted permissions to mitigate potential security threats.
- External Dependencies:
- Ensure that any external libraries required by your UC functions are pre-installed in the execution environment.
- Best Practice: Import external dependencies within the function body to guarantee their availability during execution.
- Function Overwriting:
- The
create_functionandcreate_function_asyncmethods allow overwriting existing functions by setting the replace parameter to True. - Warning: Overwriting functions can disrupt workflows that depend on existing function definitions. Use this feature judiciously and ensure that overwriting is intentional.
- The
- Type Validation and Compatibility:
- The client performs strict type validation based on the defined schemas. Ensure that your function parameters and return types adhere to the expected types to prevent execution errors.
Key Features
- Asynchronous and Synchronous Operations: Flexibly choose between async and sync methods based on your application's concurrency requirements.
- Comprehensive Function Management: Easily create, retrieve, list, execute, and delete UC functions.
- Integration with GenAI: Seamlessly integrate UC functions as tools within Generative AI agents, enhancing intelligent automation workflows.
- Type Safety and Caching: Enforce strict type validation and utilize caching mechanisms to optimize performance and reduce redundant executions.
Prerequisites
Before using the OSS UC client, ensure that your environment meets the following requirements:
-
Python Version: Python 3.10 or higher is recommended to leverage all functionalities, including function creation and execution.
-
Dependencies: Install the necessary packages using pip:
pip install unitycatalog-client unitycatalog-ai[oss]
-
Unity Catalog Server: Ensure that you have access to a running instance of the open-source Unity Catalog server. Follow the Unity Catalog OSS Installation Guide to set up your server if you haven't already.
Client Initialization
To interact with OSS UC functions, initialize the UnitycatalogFunctionClient as shown below:
import asyncio
from unitycatalog.ai.core.oss import UnitycatalogFunctionClient
from unitycatalog.client import ApiClient, Configuration
# Configure the Unity Catalog API client
config = Configuration(
host="http://localhost:8080/api/2.1/unity-catalog" # Replace with your UC server URL
)
# Initialize the asynchronous ApiClient
api_client = ApiClient(configuration=config)
# Instantiate the UnitycatalogFunctionClient
uc_client = UnitycatalogFunctionClient(api_client=api_client)
# Example catalog and schema names
CATALOG = "my_catalog"
SCHEMA = "my_schema"
Creating a UC Function
You can create a UC function either by providing a Python callable or by submitting a FunctionInfo object. Below is an example (recommended) of using the create_python_function API that accepts a Python callable (function) as input.
To create a UC function from a Python function, define your function with appropriate type hints and a Google-style docstring:
def add_numbers(a: float, b: float) -> float:
"""
Adds two numbers and returns the result.
Args:
a (float): First number.
b (float): Second number.
Returns:
float: The sum of the two numbers.
"""
return a + b
# Create the function within the Unity Catalog catalog and schema specified
function_info = uc_client.create_python_function(
func=add_numbers,
catalog=CATALOG,
schema=SCHEMA,
replace=False, # Set to True to overwrite if the function already exists
)
print(function_info)
Retrieving a UC Function
To retrieve details of a specific UC function, use the get_function method with the full function name in the format <catalog>.<schema>.<function_name>:
full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
# Retrieve the function information and metadata
function_info = uc_client.get_function(full_func_name)
print(function_info)
Listing Functions
# List all created functions within a given schema
functions = uc_client.list_functions(
catalog=CATALOG,
schema=SCHEMA,
max_results=10 # Paginated results will contain a continuation token that can be submitted with additional requests
)
for func in functions.items:
print(func)
Executing a Function
Note that function execution occurs in the main process of where you are calling this API from. Read the notes above about security considerations for unknown code execution before calling this API.
full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
parameters = {"a": 10.5, "b": 5.5}
# Or synchronously
result = uc_client.execute_function(full_func_name, parameters)
print(result.value) # Outputs: 16.0
Deleting a Function
To delete a function that you have write authority to, you can use the following API:
full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
uc_client.delete_function(full_func_name)
Databricks-managed UC
To use Databricks-managed Unity Catalog with this package, follow the instructions to authenticate to your workspace and ensure that your access token has workspace-level privilege for managing UC functions.
Prerequisites
- [Highly recommended] Use python>=3.10 for accessing all functionalities including function creation and function execution.
- Install databricks-sdk package with
pip install databricks-sdk. - For creating UC functions with SQL body, only serverless compute is supported.
Install databricks-connect package with
pip install databricks-connect==15.1.0, python>=3.10 is a requirement to install this version. - For executing the UC functions in Databricks, use either SQL warehouse or Databricks Connect with serverless:
- SQL warehouse: create a SQL warehouse following this instruction, and use the warehouse id when initializing the client.
NOTE: only
serverlessSQL warehouse type is supported because of performance concerns. - Databricks connect with serverless: Install databricks-connect package with
pip install databricks-connect==15.1.0. No config needs to be passed when initializing the client.
- SQL warehouse: create a SQL warehouse following this instruction, and use the warehouse id when initializing the client.
NOTE: only
Client initialization
In this example, we use serverless compute as an example.
from unitycatalog.ai.core.databricks import DatabricksFunctionClient
client = DatabricksFunctionClient()
Create a UC function
Create a UC function with SQL string should follow this syntax.
# make sure you have privilege in the corresponding catalog and schema for function creation
CATALOG = "..."
SCHEMA = "..."
func_name = "test"
sql_body = f"""CREATE FUNCTION {CATALOG}.{SCHEMA}.{func_name}(s string)
RETURNS STRING
LANGUAGE PYTHON
AS $$
return s
$$
"""
function_info = client.create_function(sql_function_body=sql_body)
Retrieve a UC function
The client also provides API to get the UC function information details. Note that the function name passed in must be the full name in the format of <catalog>.<schema>.<function_name>.
full_func_name = f"{CATALOG}.{SCHEMA}.{func_name}"
client.get_function(full_func_name)
List UC functions
To get a list of functions stored in a catalog and schema, you can use list API with wildcards to do so.
client.list_functions(catalog=CATALOG, schema=SCHEMA, max_results=5)
Execute a UC function
Parameters passed into execute_function must be a dictionary that maps to the input params defined by the UC function.
result = client.execute_function(full_func_name, {"s": "some_string"})
assert result.value == "some_string"
Function execution arguments configuration
To manage the function execution behavior using Databricks client under different configurations, we offer the following environment variables:
| Configuration Type | Environment Variable | Description | Default Value |
|---|---|---|---|
| Warehouse Execution | UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_WAIT_TIMEOUT |
Time in seconds the call will wait for the function to execute. Set as Ns where N can be 0 or between 5 and 50. |
30s |
UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_ROW_LIMIT |
Maximum number of rows in the function execution result. Also sets the truncated field in the response to indicate if the result was trimmed due to the limit. |
100 | |
UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_BYTE_LIMIT |
Maximum byte size of the function execution result. If truncated due to this limit, the truncated field in the response is set to true. |
1048576 | |
UCAI_DATABRICKS_WAREHOUSE_RETRY_TIMEOUT |
Client-side retry timeout for function execution. If execution doesn't complete within UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_WAIT_TIMEOUT, client retries with exponential wait times until this timeout is reached. |
120 | |
| Serverless Compute Execution | UCAI_DATABRICKS_SERVERLESS_EXECUTION_RESULT_ROW_LIMIT |
Maximum number of rows in the function execution result. | 100 |
Reminders
- If the function contains a
DECIMALtype parameter, it is converted to pythonfloatfor execution, and this conversion may lose precision.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unitycatalog_ai-0.1.0rc0.tar.gz.
File metadata
- Download URL: unitycatalog_ai-0.1.0rc0.tar.gz
- Upload date:
- Size: 36.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fc4be4aeec6426a3ebbd00311347054173b2f765dd8fab943e90a7b2c0c5da1
|
|
| MD5 |
c30d9887eb80f42b0e30991bad47060c
|
|
| BLAKE2b-256 |
e0674ce4e18c3da88925520d003beff78b7c2429dd2d38de3c889d6164864634
|
File details
Details for the file unitycatalog_ai-0.1.0rc0-py3-none-any.whl.
File metadata
- Download URL: unitycatalog_ai-0.1.0rc0-py3-none-any.whl
- Upload date:
- Size: 43.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fb5dffb69f68cedbaa070449e0cdf7e232df97b66f39f51199392bae62eda28
|
|
| MD5 |
fa2d2503ff87997d77384a8519944b9b
|
|
| BLAKE2b-256 |
08ae7634fe122699aa796780178eee10bc9dec308a03f3182435aa945c8a5607
|