Skip to main content

Official Python library for Unity Catalog AI support

Project description

Unity Catalog AI Core library

The Unity Catalog AI Core library provides convenient APIs to interact with Unity Catalog functions, including the creation, retrieval and execution of functions. The library includes clients for interacting with both the Open-Source Unity Catalog server and the Databricks-managed Unity Catalog service, in support of UC functions as tools in agents.

Installation

# install for OSS Unity Catalog
pip install unitycatalog-ai[oss]

# install for Databricks Unity Catalog
pip install unitycatalog-ai[databricks]

Get started

Unity Catalog Open Source

The Open-Source Unity Catalog (OSS UC) client is a core component of the Unity Catalog AI Core Library, enabling seamless interaction with the open-source version of Unity Catalog. This client allows you to manage and execute UC functions, providing both asynchronous and synchronous interfaces to cater to various application needs. Whether you're integrating UC functions into GenAI workflows or managing them directly, the OSS UC client offers robust and flexible APIs to facilitate your development process.

Caveats

When using the UnitycatalogFunctionClient for OSS UC, be mindful of the following considerations:

  • Asynchronous API Usage:
    • The UnitycatalogFunctionClient is built on top of the asynchronous unitycatalog-client SDK, which utilizes aiohttp.
    • The function client for OSS Unity Catalog offers both asynchronous and synchronous methods. The synchronous methods are wrappers around the asynchronous counterparts, ensuring compatibility with environments that may not support asynchronous operations.
    • Important: Avoid creating additional event loops in environments that already have a running loop (e.g., Jupyter Notebooks) to prevent conflicts and potential runtime errors.
  • Security Considerations:
    • WARNING Function execution occurs locally within the environment where your application is running.
    • Caution: Executing GenAI-generated Python code can pose security risks, especially if the code includes operations like file system access or network requests.
    • Recommendation: Run your application in an isolated and secure environment with restricted permissions to mitigate potential security threats.
  • External Dependencies:
    • Ensure that any external libraries required by your UC functions are pre-installed in the execution environment.
    • Best Practice: Import external dependencies within the function body to guarantee their availability during execution.
  • Function Overwriting:
    • The create_function and create_function_async methods allow overwriting existing functions by setting the replace parameter to True.
    • Warning: Overwriting functions can disrupt workflows that depend on existing function definitions. Use this feature judiciously and ensure that overwriting is intentional.
  • Type Validation and Compatibility:
    • The client performs strict type validation based on the defined schemas. Ensure that your function parameters and return types adhere to the expected types to prevent execution errors.

Key Features

  • Asynchronous and Synchronous Operations: Flexibly choose between async and sync methods based on your application's concurrency requirements.
  • Comprehensive Function Management: Easily create, retrieve, list, execute, and delete UC functions.
  • Integration with GenAI: Seamlessly integrate UC functions as tools within Generative AI agents, enhancing intelligent automation workflows.
  • Type Safety and Caching: Enforce strict type validation and utilize caching mechanisms to optimize performance and reduce redundant executions.

Prerequisites

Before using the OSS UC client, ensure that your environment meets the following requirements:

  • Python Version: Python 3.10 or higher is recommended to leverage all functionalities, including function creation and execution.

  • Dependencies: Install the necessary packages using pip:

    pip install unitycatalog-client unitycatalog-ai[oss]
    
  • Unity Catalog Server: Ensure that you have access to a running instance of the open-source Unity Catalog server. Follow the Unity Catalog OSS Installation Guide to set up your server if you haven't already.

Client Initialization

To interact with OSS UC functions, initialize the UnitycatalogFunctionClient as shown below:

import asyncio
from unitycatalog.ai.core.oss import UnitycatalogFunctionClient
from unitycatalog.client import ApiClient, Configuration

# Configure the Unity Catalog API client
config = Configuration(
    host="http://localhost:8080/api/2.1/unity-catalog"  # Replace with your UC server URL
)

# Initialize the asynchronous ApiClient
api_client = ApiClient(configuration=config)

# Instantiate the UnitycatalogFunctionClient
uc_client = UnitycatalogFunctionClient(api_client=api_client)

# Example catalog and schema names
CATALOG = "my_catalog"
SCHEMA = "my_schema"

Creating a UC Function

You can create a UC function either by providing a Python callable or by submitting a FunctionInfo object. Below is an example (recommended) of using the create_python_function API that accepts a Python callable (function) as input.

To create a UC function from a Python function, define your function with appropriate type hints and a Google-style docstring:

def add_numbers(a: float, b: float) -> float:
    """
    Adds two numbers and returns the result.

    Args:
        a (float): First number.
        b (float): Second number.

    Returns:
        float: The sum of the two numbers.
    """
    return a + b

# Create the function within the Unity Catalog catalog and schema specified
function_info = uc_client.create_python_function(
    func=add_numbers,
    catalog=CATALOG,
    schema=SCHEMA,
    replace=False,  # Set to True to overwrite if the function already exists
)

print(function_info)

Retrieving a UC Function

To retrieve details of a specific UC function, use the get_function method with the full function name in the format <catalog>.<schema>.<function_name>:

full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"

# Retrieve the function information and metadata
function_info = uc_client.get_function(full_func_name)

print(function_info)

Listing Functions

# List all created functions within a given schema
functions = uc_client.list_functions(
    catalog=CATALOG,
    schema=SCHEMA,
    max_results=10  # Paginated results will contain a continuation token that can be submitted with additional requests
)

for func in functions.items:
    print(func)

Executing a Function

Note that function execution occurs in the main process of where you are calling this API from. Read the notes above about security considerations for unknown code execution before calling this API.

full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
parameters = {"a": 10.5, "b": 5.5}

# Or synchronously
result = uc_client.execute_function(full_func_name, parameters)

print(result.value)  # Outputs: 16.0

Deleting a Function

To delete a function that you have write authority to, you can use the following API:

full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"

uc_client.delete_function(full_func_name)

Databricks-managed UC

To use Databricks-managed Unity Catalog with this package, follow the instructions to authenticate to your workspace and ensure that your access token has workspace-level privilege for managing UC functions.

Prerequisites

  • [Highly recommended] Use python>=3.10 for accessing all functionalities including function creation and function execution.
  • Install databricks-sdk package with pip install databricks-sdk.
  • For creating UC functions with SQL body, only serverless compute is supported. Install databricks-connect package with pip install databricks-connect==15.1.0, python>=3.10 is a requirement to install this version.
  • For executing the UC functions in Databricks, use either SQL warehouse or Databricks Connect with serverless:
    • SQL warehouse: create a SQL warehouse following this instruction, and use the warehouse id when initializing the client. NOTE: only serverless SQL warehouse type is supported because of performance concerns.
    • Databricks connect with serverless: Install databricks-connect package with pip install databricks-connect==15.1.0. No config needs to be passed when initializing the client.

Client initialization

In this example, we use serverless compute as an example.

from unitycatalog.ai.core.databricks import DatabricksFunctionClient

client = DatabricksFunctionClient()

Create a UC function

Create a UC function with SQL string should follow this syntax.

# make sure you have privilege in the corresponding catalog and schema for function creation
CATALOG = "..."
SCHEMA = "..."
func_name = "test"
sql_body = f"""CREATE FUNCTION {CATALOG}.{SCHEMA}.{func_name}(s string)
RETURNS STRING
LANGUAGE PYTHON
AS $$
  return s
$$
"""

function_info = client.create_function(sql_function_body=sql_body)

Retrieve a UC function

The client also provides API to get the UC function information details. Note that the function name passed in must be the full name in the format of <catalog>.<schema>.<function_name>.

full_func_name = f"{CATALOG}.{SCHEMA}.{func_name}"
client.get_function(full_func_name)

List UC functions

To get a list of functions stored in a catalog and schema, you can use list API with wildcards to do so.

client.list_functions(catalog=CATALOG, schema=SCHEMA, max_results=5)

Execute a UC function

Parameters passed into execute_function must be a dictionary that maps to the input params defined by the UC function.

result = client.execute_function(full_func_name, {"s": "some_string"})
assert result.value == "some_string"
Function execution arguments configuration

To manage the function execution behavior using Databricks client under different configurations, we offer the following environment variables:

Configuration Type Environment Variable Description Default Value
Warehouse Execution UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_WAIT_TIMEOUT Time in seconds the call will wait for the function to execute. Set as Ns where N can be 0 or between 5 and 50. 30s
UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_ROW_LIMIT Maximum number of rows in the function execution result. Also sets the truncated field in the response to indicate if the result was trimmed due to the limit. 100
UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_BYTE_LIMIT Maximum byte size of the function execution result. If truncated due to this limit, the truncated field in the response is set to true. 1048576
UCAI_DATABRICKS_WAREHOUSE_RETRY_TIMEOUT Client-side retry timeout for function execution. If execution doesn't complete within UCAI_DATABRICKS_WAREHOUSE_EXECUTE_FUNCTION_WAIT_TIMEOUT, client retries with exponential wait times until this timeout is reached. 120
Serverless Compute Execution UCAI_DATABRICKS_SERVERLESS_EXECUTION_RESULT_ROW_LIMIT Maximum number of rows in the function execution result. 100

Reminders

  • If the function contains a DECIMAL type parameter, it is converted to python float for execution, and this conversion may lose precision.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unitycatalog_ai-0.1.0rc0.tar.gz (36.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unitycatalog_ai-0.1.0rc0-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file unitycatalog_ai-0.1.0rc0.tar.gz.

File metadata

  • Download URL: unitycatalog_ai-0.1.0rc0.tar.gz
  • Upload date:
  • Size: 36.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for unitycatalog_ai-0.1.0rc0.tar.gz
Algorithm Hash digest
SHA256 8fc4be4aeec6426a3ebbd00311347054173b2f765dd8fab943e90a7b2c0c5da1
MD5 c30d9887eb80f42b0e30991bad47060c
BLAKE2b-256 e0674ce4e18c3da88925520d003beff78b7c2429dd2d38de3c889d6164864634

See more details on using hashes here.

File details

Details for the file unitycatalog_ai-0.1.0rc0-py3-none-any.whl.

File metadata

File hashes

Hashes for unitycatalog_ai-0.1.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 3fb5dffb69f68cedbaa070449e0cdf7e232df97b66f39f51199392bae62eda28
MD5 fa2d2503ff87997d77384a8519944b9b
BLAKE2b-256 08ae7634fe122699aa796780178eee10bc9dec308a03f3182435aa945c8a5607

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page