Skip to main content

Official Python library for Unity Catalog AI support

Project description

Unity Catalog AI Core library

The Unity Catalog AI Core library provides convenient APIs to interact with Unity Catalog functions, including the creation, retrieval and execution of functions. The library includes clients for interacting with both Unity Catalog servers and Databricks-managed Unity Catalog services, in support of UC functions as tools in agents.

Installation

pip install unitycatalog-ai

If you are using the Databricks-managed version of Unity Catalog, you can install the optional additional Databricks dependencies by providing the option:

pip install unitycatalog-ai[databricks]

Get started

Unity Catalog Function Client

The Unity Catalog (UC) function client is a core component of the Unity Catalog AI Core Library, enabling seamless interaction with a Unity Catalog server. This client allows you to manage and execute UC functions, providing both asynchronous and synchronous interfaces to cater to various application needs. Whether you're integrating UC functions into GenAI workflows or managing them directly, the UC client offers robust and flexible APIs to facilitate your development process.

Key Features

  • Asynchronous and Synchronous Operations: Flexibly choose between async and sync methods based on your application's concurrency requirements.
  • Comprehensive Function Management: Easily create, retrieve, list, execute, and delete UC functions.
  • Wrapped Function Support: In addition to standard single-function creation, you can create wrapped functions that in-line additional helper functions within a function's definition to simplify code reuse and modularity.
  • Integration with GenAI: Seamlessly integrate UC functions as tools within Generative AI agents, enhancing intelligent automation workflows.
  • Type Safety and Caching: Enforce strict type validation and utilize caching mechanisms to optimize performance and reduce redundant executions.

Caveats

When using the UnitycatalogFunctionClient for UC, be mindful of the following considerations:

  • Asynchronous API Usage:
    • The UnitycatalogFunctionClient is built on top of the asynchronous unitycatalog-client SDK, which utilizes aiohttp for REST communication with the UC server.
    • The function client for Unity Catalog offers both asynchronous and synchronous methods. The synchronous methods are wrappers around the asynchronous counterparts, ensuring compatibility with environments that may not support asynchronous operations.
    • Important: Avoid creating additional event loops in environments that already have a running loop (e.g., Jupyter Notebooks) to prevent conflicts and potential runtime errors.
  • Security Considerations:
    • WARNING Function execution occurs locally within the environment where your application is running.
    • Caution: Executing GenAI-generated Python code can pose security risks, especially if the code includes operations like file system access or network requests.
    • Recommendation: Run your application in an isolated and secure environment with restricted permissions to mitigate potential security threats.
  • External Dependencies:
    • Ensure that any external libraries required by your UC functions are pre-installed in the execution environment.
    • Best Practice: Import external dependencies within the function body to guarantee their availability during execution.
  • Function Overwriting:
    • The create_function, create_function_async, create_wrapped_function and create_wrapped_function_async methods allow overwriting existing functions by setting the replace parameter to True.
    • Warning: Overwriting functions can disrupt workflows that depend on existing function definitions. Use this feature judiciously and ensure that overwriting is intentional.
  • Type Validation and Compatibility:
    • The client performs strict type validation based on the defined schemas. Ensure that your function parameters and return types adhere to the expected types to prevent execution errors.

Prerequisites

Before using the UC functions client, ensure that your environment meets the following requirements:

  • Python Version: Python 3.10 or higher is recommended to leverage all functionalities, including function creation and execution.

  • Dependencies: Install the necessary packages using pip:

    pip install unitycatalog-client unitycatalog-ai
    
  • Unity Catalog Server: Ensure that you have access to a running instance of the open-source Unity Catalog server. Follow the Unity Catalog Installation Guide to set up your server if you haven't already.

Client Initialization

To interact with UC functions, initialize the UnitycatalogFunctionClient as shown below:

import asyncio
from unitycatalog.ai.core.client import UnitycatalogFunctionClient
from unitycatalog.client import ApiClient, Configuration

# Configure the Unity Catalog API client
config = Configuration(
    host="http://localhost:8080/api/2.1/unity-catalog"  # Replace with your UC server URL
)

# Initialize the asynchronous ApiClient
api_client = ApiClient(configuration=config)

# Instantiate the UnitycatalogFunctionClient
uc_client = UnitycatalogFunctionClient(api_client=api_client)

# Example catalog and schema names
CATALOG = "my_catalog"
SCHEMA = "my_schema"

Creating a UC Function

You can create a UC function either by providing a Python callable or by submitting a FunctionInfo object. Below is an example (recommended) of using the create_python_function API that accepts a Python callable (function) as input.

To create a UC function from a Python function, define your function with appropriate type hints and a Google-style docstring:

def add_numbers(a: float, b: float) -> float:
    """
    Adds two numbers and returns the result.

    Args:
        a (float): First number.
        b (float): Second number.

    Returns:
        float: The sum of the two numbers.
    """
    return a + b

# Create the function within the Unity Catalog catalog and schema specified
function_info = uc_client.create_python_function(
    func=add_numbers,
    catalog=CATALOG,
    schema=SCHEMA,
    replace=False,  # Set to True to overwrite if the function already exists
)

print(function_info)

Creating a Wrapped UC Function

In addition to standard function creation, you can create wrapped functions. A wrapped function uses a primary function as the interface while in-lining additional helper functions (wrapped functions) into the primary function’s definition. This feature is useful when you want to keep helper logic bundled together with the main function without needing to replicate existing common utilities within your function definitions.

For example, consider the following helper functions and the primary wrapper function that has direct dependencies on the helper functions:

def a(x: int) -> int:
    return x + 1

def b(y: int) -> int:
    return y + 2

def wrapper(x: int, y: int) -> int:
    """
    Wrapper function that in-lines helper functions a and b.

    Args:
        x (int): The first argument.
        y (int): The second argument.

    Returns:
        int: The combined result of a(x) and b(y).
    """
    return a(x) + b(y)

To register this wrapped function as a single UC function, you can call the create_wrapped_function API:

function_info = uc_client.create_wrapped_function(
    primary_func=wrapper,
    functions=[a, b],
    catalog=CATALOG,
    schema=SCHEMA,
    replace=False,  # Set to True to overwrite if the function already exists
)

Retrieving a UC Function

To retrieve details of a specific UC function, use the get_function method with the full function name in the format <catalog>.<schema>.<function_name>:

full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"

# Retrieve the function information and metadata
function_info = uc_client.get_function(full_func_name)

print(function_info)

Listing Functions

# List all created functions within a given schema
functions = uc_client.list_functions(
    catalog=CATALOG,
    schema=SCHEMA,
    max_results=10  # Paginated results will contain a continuation token that can be submitted with additional requests
)

for func in functions.items:
    print(func)

Executing a Function

Note that function execution occurs in the main process of where you are calling this API from. Read the notes above about security considerations for unknown code execution before calling this API.

full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
parameters = {"a": 10.5, "b": 5.5}

# Or synchronously
result = uc_client.execute_function(full_func_name, parameters)

print(result.value)  # Outputs: 16.0

Function Parameter Defaults

Defining and executing functions with parameter defaults behave similarly to standard Python function argument defaults. If a parameter is not provided that is marked as having a default value when called via the execute_function API, the existing default parameter value will be mapped to the function invocation call.

If using defaults in your function signatures, ensure that the descriptions are accurate and declare what the default value is to ensure that Agentic use of your function is accurate.

Deleting a Function

To delete a function that you have write authority to, you can use the following API:

full_func_name = f"{CATALOG}.{SCHEMA}.add_numbers"

uc_client.delete_function(full_func_name)

Databricks-managed UC

To use Databricks-managed Unity Catalog with this package, follow the instructions to authenticate to your workspace and ensure that your access token has workspace-level privilege for managing UC functions.

Prerequisites

  • [Highly recommended] Use python>=3.10 for accessing all functionalities including function creation and function execution.
  • For creating UC functions with a SQL body definition, only serverless compute is supported. Install databricks-connect package with pip install databricks-connect==15.1.0 to access serverless compute. python>=3.10 is a requirement to install this version of the package.
  • For executing the UC functions within Databricks, use either SQL warehouse or Databricks Connect with serverless:
    • SQL warehouse: create a SQL warehouse following this instruction, and use the warehouse id when initializing the client. NOTE: only serverless SQL warehouse type is supported because of performance concerns.
    • Databricks connect with serverless: Install databricks-connect package with pip install databricks-connect==15.1.0. No config needs to be passed when initializing the client.

Client initialization

In this example, we use serverless compute as an example.

from unitycatalog.ai.core.databricks import DatabricksFunctionClient

client = DatabricksFunctionClient()

Create a UC function

Create a UC function with SQL string should follow this syntax.

# make sure you have privilege in the corresponding catalog and schema for function creation
CATALOG = "..."
SCHEMA = "..."
func_name = "test"
sql_body = f"""CREATE FUNCTION {CATALOG}.{SCHEMA}.{func_name}(s string)
RETURNS STRING
LANGUAGE PYTHON
AS $$
  return s
$$
"""

function_info = client.create_function(sql_function_body=sql_body)

Dependencies and Environments

In Databricks runtime version 17 and higher, the ability to specify dependencies within a function execution environment is supported. Earlier runtime versions do not support this feature and will error if the arguments dependencies or environment are submitted with a create_python_function or create_wrapped_python_function call.

To specify PyPI dependencies to include in your execution environment, you can see the minimum example below:

# Define a function that requires an external PyPI dependency

def dep_check(x: str) -> str:
    """
    A function to test the dependency support for UC

    Args:
        x: An input string
    
    Returns:
        A string that reports the dependency support for UC
    """

    import scrapy  # NOTE that you must still import the library to use within the function.

    return scrapy.__version__

# Create the function and supply the dependency in standard PyPI format
client.create_python_function(func=dep_check, catalog=CATALOG, schema=SCHEMA, replace=True, dependencies=["scrapy==2.10.1"])

Retrieve a UC function

The client also provides API to get the UC function information details. Note that the function name passed in must be the full name in the format of <catalog>.<schema>.<function_name>.

full_func_name = f"{CATALOG}.{SCHEMA}.{func_name}"
client.get_function(full_func_name)

List UC functions

To get a list of functions stored in a catalog and schema, you can use list API with wildcards to do so.

client.list_functions(catalog=CATALOG, schema=SCHEMA, max_results=5)

Execute a UC function

Parameters passed into execute_function must be a dictionary that maps to the input params defined by the UC function.

result = client.execute_function(full_func_name, {"s": "some_string"})
assert result.value == "some_string"
Function execution arguments configuration

To manage the function execution behavior using Databricks client under different configurations, we offer the following environment variables:

Environment Variable Description Default Value
UCAI_DATABRICKS_SESSION_RETRY_MAX_ATTEMPTS Maximum number of attempts to retry refreshing the session client in case of token expiry. 5
UCAI_DATABRICKS_SERVERLESS_EXECUTION_RESULT_ROW_LIMIT Maximum number of rows when executing functions using serverless compute with databricks-connect. 100
                     | 100           |

Reminders

  • If the function contains a DECIMAL type parameter, it is converted to python float for execution, and this conversion may lose precision.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unitycatalog_ai-0.2.0.tar.gz (40.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unitycatalog_ai-0.2.0-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file unitycatalog_ai-0.2.0.tar.gz.

File metadata

  • Download URL: unitycatalog_ai-0.2.0.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for unitycatalog_ai-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4bf92dfb2d71dc4d001d3ee329e07e10ad8351422ec515afc7a45941c40d74c7
MD5 5277d425ce0a7e509ff1c2d4c367bc0e
BLAKE2b-256 f0c2831403376b2f44f55a790974c6ae9446f818cb695e83f87dd2f191439180

See more details on using hashes here.

File details

Details for the file unitycatalog_ai-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for unitycatalog_ai-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4178304e986ca42500045a800b9d0b87b8c1396d3c8408b42e60850d51320015
MD5 4f1beea81b838bdca1896e2c51c9b158
BLAKE2b-256 4afe271d25a9240de72f6c43814fcc20397b4128f9b27408d3cec71c753f6f43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page