Skip to main content

Teradata API Client Python package

Project description

tdapiclient - Teradata Third Party Analytics Integration Python Library

The tdapiclient Python library integrates the Python libraries from AWS SageMaker, Azure ML, and Google Vertex AI with Teradata. Users can train and score their models using teradataml DataFrame. tdapiclient will transparantly convert the teradataml DataFrame to an S3 address, Azure ML Dataset or Blob, or Vertex AI dataset to be used for training. The user can then provide another teradataml DataFrame as input for inference.

Users of tdapiclient can also deploy models trained in Azure ML, AWS SageMaker, or Vertex AI to a Teradata Vantage system for in-database scoring using BYOM functionality.

This library also provides API_Request, a method to call API_Request UDF, which can be used for obtaining OpenAI and Azure OpenAI text embeddings from large language models. This method can also be used for scoring models hosted in AWS, Azure, or Google Cloud Platform, equivalent to predicting in UDF mode through the tdapiclient predict method.

For community support, please visit the Teradata Community. For Teradata customer support, please visit Teradata Support.

Copyright 2022, Teradata. All Rights Reserved.

Table of Contents

Release Notes

tdapiclient 1.4.0.1

This release fixes an issue with the SageMaker fit method, related to the WriteNOS function when called on CSV data. When exporting CSV data through WriteNOS, Teradata automatically converts floats into a string representation which cannot be parsed by popular data manipulation libraries, such as Pandas. The SageMaker fit method now exports CSV data in a suitable format.

tdapiclient 1.4.0.0

  • tdapiclient 1.4.0.00 is the fourth release version. This release adds support for Google Vertex AI integration with Teradata Vantage. The static method TDApiClient.API_Request now supports OpenAI and Azure OpenAI for obtaining text embeddings from large language models. Please refer to the API Integration Guide for Cloud Machine Learning for a list of Limitations and Usage Considerations.

tdapiclient 1.2.1.0

  • tdapiclient 1.2.1.00 is the third release version. This release adds BYOM deployment support for SageMaker and optimizes fit method for CSV data format. Please refer to the API Integration Guide for Cloud Machine Learning for a list of Limitations and Usage Considerations.

tdapiclient 1.1.1.0

  • tdapiclient 1.1.1.00 is the second release version. This release adds a support for AzureML integration with Teradata vantage. Please refer to the API Integration Guide for Cloud Machine Learning for a list of Limitations and Usage Considerations.

tdapiclient 1.0.0.0

  • tdapiclient 1.00.00.00 is the first release version. Please refer to the API Integration Guide for Cloud Machine Learning for a list of Limitations and Usage Considerations.

Installation and Requirements

Package Requirements

  • Python 3.6 or later

Note: 32-bit Python is not supported.

Minimum System Requirements

  • Windows 7 (64Bit) or later
  • macOS 10.9 (64Bit) or later
  • Red Hat 7 or later versions
  • Ubuntu 16.04 or later versions
  • CentOS 7 or later versions
  • SLES 12 or later versions
  • Teradata Vantage Advanced SQL Engine:
    • Advanced SQL Engine 17.05 Feature Update 1 or later

Installation

Use pip to install the tdapiclient - Teradata Sagemaker Python Library

Platform Command
macOS/Linux pip install tdapiclient
Windows py -3 -m pip install tdapiclient

Using the tdapiclient Python Package with SageMaker

Your Python script must import the tdapiclient package in order to use the tdapiclient Python Library

>>> from tdapiclient import create_aws_context,TDApiClient
>>> from teradataml import create_context, DataFrame, copy_to_sql

>>> # Create connection to Teradata Vantage System
>>> host = input("Host: ")
>>> username = input("Username: ")
>>> password = getpass.getpass("Password: ")
>>> td_context = create_context(host=host, username=username, password=password)

# Create AWS Context to be used in TDApiClient
>>> s3_bucket = input("S3 Bucket(Please give just the bucket name) :")
>>> access_id = input("Access ID:")
>>> access_key = getpass.getpass("Acess Key: ")
>>> region = input("AWS Region: ")

>>>   os.environ["AWS_ACCESS_KEY_ID"] = access_id
>>>   os.environ["AWS_SECRET_ACCESS_KEY"] = access_key
>>>   os.environ["AWS_REGION"] = region

>>> aws_context = create_tdapi_context("aws", bucket_name=s3_bucket)
# Create TDApiClient Instance
>>> td_apiclient = TDApiClient(aws_context)

# Load data in teradata tables
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.datasets import fetch_california_housing

>>> data = fetch_california_housing()
>>> X_train, X_test, y_train, y_test = train_test_split(
     data.data, data.target, test_size=0.25, random_state=42)

>>> trainX = pd.DataFrame(X_train, columns=data.feature_names)
>>> trainX["target"] = y_train

>>> testX = pd.DataFrame(X_test, columns=data.feature_names)
>>> testX["target"] = y_test

>>> train_table = "housing_data_train"
>>> test_table = "housing_data_test"

>>> column_types = {"MedInc": FLOAT, "HouseAge": FLOAT,
                "AveRooms": FLOAT, "AveBedrms": FLOAT, "Population": FLOAT,
                "AveOccup": FLOAT, "Latitude": FLOAT, "Longitude": FLOAT,
                "target" : FLOAT}

>>> copy_to_sql(df=trainX, table_name=train_table, if_exists="replace", types=column_types)
>>> copy_to_sql(df=testX, table_name=test_table, if_exists="replace", types=column_types)

# Create teradataml DataFrame for input tables

>>> test_df = DataFrame(table_name=test_table)
>>> train_df = DataFrame(table_name=train_table)

>>> exec_role_arn = "arn:aws:iam::XX:role/service-role/AmazonSageMaker-ExecutionRole-20210112T215668"
>>> FRAMEWORK_VERSION = "0.23-1"
# Create an estimator object based on sklearn sagemaker class
>>> sklearn_estimator = td_apiclient.SKLearn(
    entry_point="sklearn-script.py",
    role=exec_role_arn,
    instance_count=1,
    instance_type="ml.m5.large",
    framework_version=FRAMEWORK_VERSION,
    base_job_name="rf-scikit",
    metric_definitions=[{"Name": "median-AE", "Regex": "AE-at-50th-percentile: ([0-9.]+).*$"}],
    hyperparameters={
        "n-estimators": 100,
        "min-samples-leaf": 3,
        "features": "MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude",
        "target": "target",
    },
)
>>> # Start training using DataFrame objects
>>> sklearn_estimator.fit({"train": test_df, "test": train_df}, content_type="csv", wait=True)

>>> from sagemaker.serializers import CSVSerializer
>>> from sagemaker.deserializers import CSVDeserializer
>>> csv_ser = CSVSerializer()
>>> csv_dser = CSVDeserializer()
>>> sg_kw = {
        "instance_type": "ml.m5.large",
        "initial_instance_count": 1,
        "serializer": csv_ser,
        "deserializer": csv_dser
    }
>>> predictor = sklearn_estimator.deploy("aws-endpoint", sagemaker_kw_args=sg_kw)

>>> # Now let's try prediction with UDF and Client options.
>>> input = DataFrame(table_name='housing_data_test')
>>> column_list = ["MedInc","HouseAge","AveRooms","AveBedrms","Population","AveOccup","Latitude","Longitude"]
>>> input = input.sample(n=5).select(column_list)

>>> output = predictor.predict(input, mode="UDF",content_type='csv')

Using the tdapiclient Python Package with Azure ML

Your Python script must import the tdapiclient package in order to use the tdapiclient Python Library

>>> import os
>>> import getpass
>>> from teradataml import create_context, DataFrame, read_csv
>>> import pandas as pd
>>> from teradatasqlalchemy.types import  *
>>> from tdapiclient import create_tdapi_context,TDApiClient, remove_tdapi_context
>>> # Create the connection.
>>> host = input("Host: ")
>>> username = input("Username: ")
>>> password = getpass.getpass("Password: ")
>>> # Create Azure Context and TDApiClient object.

>>> datastore_path = input(
>>>    "DataStore path : Please give path within data store of azure-ml workspace.")
>>> tenant_id = input("Azure Tenant ID:")
>>> client_id = getpass.getpass("Azure Client ID: ")
>>> client_secret = input("Azure Client Secret: ")
>>>
>>> azure_sub = input("Azure Subscription id: ")
>>> azure_rg = input("Azure resource group: ")
>>> azureml_ws = input("Azure-ML workspace: ")
>>> azure_region = input("Azure region: ")

>>> os.environ["AZURE_TENANT_ID"] = tenant_id
>>> os.environ["AZURE_CLIENT_ID"] = client_id
>>> os.environ["AZURE_CLIENT_SECRET"] = client_secret
>>>
>>> os.environ["AZURE_SUB_ID"] = azure_sub
>>> os.environ["AZURE_RG"] = azure_rg
>>> os.environ["AZURE_WS"] = azureml_ws
>>> os.environ["AZURE_REGION"] = azure_region

>>> tdapi_context = create_tdapi_context("azure", datastore_path="td-tables")
>>> td_apiclient = TDApiClient(tdapi_context)
>>> from collections import OrderedDict
>>>
>>> from collections import OrderedDict
>>>
>>> types = OrderedDict(bustout=INTEGER, rec_id=INTEGER, acct_no=INTEGER, as_of_dt_day=DATE, avg_pmt_05_mth=FLOAT,>>> days_since_lstcash=INTEGER, max_utilization_05_mth=INTEGER, maxamt_epmt_v7day=INTEGER, times_nsf=INTEGER,
>>>  totcash_to_line_v7day=INTEGER,totpmt_to_line_v7day=INTEGER,totpur_to_line_v7day=INTEGER,  totpurcash_to_line_v7day=INTEGER, credit_util_cur_mth=FLOAT, credit_util_prior_5_mth=FLOAT, credit_util_cur_to_prior_ratio=FLOAT, days_since_lst_pymnt=INTEGER, num_pymnt_lst_7_days=INTEGER, num_pymnt_lst_60_days=INTEGER,
>>>     pct_line_paid_lst_7_days=INTEGER, pct_line_paid_lst_30_days=INTEGER, num_pur_lst_7_days=INTEGER, num_pur_lst_60_days=INTEGER,
>>>     pct_line_pur_lst_7_days=INTEGER, pct_line_pur_lst_30_days=INTEGER, tot_pymnt_chnl=INTEGER, tot_pymnt=INTEGER, tot_pymnt_am=INTEGER, pay_by_phone=CHAR, elec_pymnt=CHAR,
>>>     pay_in_bank=CHAR, pay_by_check=CHAR, pay_by_othr=CHAR, last_12m_trans_ct=INTEGER, Sample_ID=INTEGER)

>>> # Check this csv file in Teradata Vantage Documentation site under azureml-usercases.zip
>>> df:DataFrame = read_csv(r'financial_data.csv', table_name="financial_data", types=types, use_fastload=False)
>>> # training dataframe.
>>> selected_df = df.select(["bustout", "rec_id", "avg_pmt_05_mth", "max_utilization_05_mth","times_nsf"
,"credit_util_cur_mth","credit_util_prior_5_mth","num_pur_lst_7_days","num_pur_lst_60_days","tot_pymnt_chnl","last_12m_trans_ct"])
>>> # Setup compute target for Azure ML.
>>> from azureml.core.compute import AmlCompute, ComputeTarget
>>> from azureml.core.authentication import ServicePrincipalAuthentication
>>> from azureml.core import Workspace, Environment

>>> credential = ServicePrincipalAuthentication(
>>>         tenant_id=tenant_id,
>>>         service_principal_id=client_id, service_principal_password=client_secret)

>>> ws = Workspace(subscription_id=azure_sub, resource_group=azure_rg, workspace_name=azureml_ws, auth=credential)

>>> vm_size = "Standard_DS3_v2"
>>> min_node = 1
>>> max_node = 1
>>> cluster_name = "test-td-cluster-new"
>>> provisioning_config = AmlCompute.provisioning_configuration(
>>>         vm_size=vm_size, min_nodes=min_node,
>>>         max_nodes=max_node)

>>> # Creating Compute cluster in Azure ML.
>>> compute_target = ComputeTarget.create(
>>>         ws, cluster_name, provisioning_config)
>>> compute_target.wait_for_completion(show_output=True)

>>> compute_target = ws.compute_targets["test-td-cluster-new"]
>>> from azureml.automl.core.featurization import FeaturizationConfig
>>> import logging

>>> # Selecting the target column.
>>> target_column_name = "bustout"

>>> forecast_horizon=14

>>> featurization_config = FeaturizationConfig()
>>> # Force the target column, to be integer type.
>>> featurization_config.add_prediction_transform_type("Integer")

>>> automl_config = td_apiclient.AutoMLConfig(
>>>     task="classification",
>>>     primary_metric="accuracy",
>>>     featurization=featurization_config,
>>>     blocked_models=["ExtremeRandomTrees"],
>>>     experiment_timeout_hours=0.3,
>>>     training_data=selected_df,
>>>     label_column_name=target_column_name,
>>>     compute_target=compute_target,
>>>     enable_early_stopping=True,
>>>     n_cross_validations=3,
>>>     max_concurrent_iterations=4,
>>>     max_cores_per_iteration=-1,
>>>     verbosity=logging.INFO
>>> )

>>> # Execute Azure ML training API with teradataml DataFrame as input which returns Azure ML Run Object.
>>> run = automl_config.fit(mount=False)
>>> # Get the best run after Auto ML job has completed.
>>> run_best = run.get_best_child()
>>> from azureml.core.environment import Environment
>>>
>>> # Creating an Azure ML Environment from a Dockerfile and requirements.txt.
>>> # myenv = Environment.from_dockerfile(name="new_project_env_7", dockerfile="./Dockerfile", pip_requirements="./>>> requirements.txt")
>>> myenv = Environment.from_dockerfile(name="new_project_env_18", >>> dockerfile=r'C:\Projects\AzureML-jupyter-notebooks\test-ignite-azureml-api-demo\Dockerfile', pip_requirements=r'C:\Projects\AzureML-jupyter-notebooks\test-ignite-azureml-api-demo\requirements.txt')
>>> myenv_b = myenv.build(workspace=ws)
>>> myenv_b.wait_for_completion(show_output=True)
>>> # curated_env_name = "AzureML-sklearn-0.24.1-ubuntu18.04-py37-cpu-inference"
>>> # myenv = Environment.get(workspace=ws, name=curated_env_name)
>>> myenv = Environment.get(workspace=ws, name="new_project_env_18")
>>> # Register an Azure ML model from the best run.
>>> from azureml.core import Model
>>> model:Model = run_best.register_model(model_name='voting_ensemble_model_1', model_path='outputs/model.pkl',>>> model_framework=Model.Framework.SCIKITLEARN)
>>> from azureml.core.model import Model
>>> model = Model(workspace=ws, name="voting_ensemble_model_1", version=1)

>>> from enum import auto
>>> from operator import mod
>>> from platform import platform
>>> from azureml.core import Model
>>> from azureml.core.model import InferenceConfig, Model
>>> from azureml.core.webservice import AciWebservice, Webservice
>>> from azureml.core.environment import Environment
>>> print(myenv)
>>> # Combine scoring script & environment in Inference configuration
>>> # inference_config = InferenceConfig(entry_script="scoring.py",
>>> #                                    environment=myenv)
>>> myenv.inferencing_stack_version = 'latest'
>>> inference_config = InferenceConfig(entry_script=r'C:\Projects\test-tdapiclient\tdapiclient\notebooks\azureml-az-webservice\scoring.py',
                                   environment=myenv)
>>> # Set deployment configuration
>>> deployment_config = AciWebservice.deploy_configuration(cpu_cores = 2,
>>>                                                        memory_gb = 4, auth_enabled=True)

>>> # Creating azmodel_deploy_kwargs dictionary to pass as a keyword argument for deploy method.
>>> azmodel_deploy_kwargs = {}
>>> azmodel_deploy_kwargs["name"] = "tdapiclient-endpoint-29"
>>> azmodel_deploy_kwargs["models"] = [model]
>>> azmodel_deploy_kwargs["workspace"] = ws
>>> azmodel_deploy_kwargs["inference_config"] = inference_config
>>> azmodel_deploy_kwargs["deployment_config"] = deployment_config
>>> azmodel_deploy_kwargs["overwrite"] = True

>>> # Deploying the model to Azure ML Compute cluster if the platform is az-webservice.
>>> webservice = automl_config.deploy(platform="az-webservice", model=model, model_type="",
>>>                         model_deploy_kwargs=azmodel_deploy_kwargs)
>>> webservice.wait_for_deployment(show_output=True)
>>> # Creating an options dictionary to pass the content_format for scoring.
>>> options = {}
>>> content_format = {}
>>> content_format["Inputs"] = [["%row"]]
>>> options["content_format"] = content_format
>>> print(webservice.predict(test_df, **options, mode="udf", content_type='json'))
......
......

Using the tdapiclient Python Package with Vertex AI

Your Python script must import the tdapiclient package in order to use the tdapiclient Python Library

>>> import os, getpass
>>> from teradataml import create_context, DataFrame, remove_context, load_example_data
>>> from tdapiclient import create_tdapi_context, TDApiClient, remove_tdapi_context

# Create connection to Teradata Vantage system
>>> host = input("Host: ")
>>> username = input("Username: ")
>>> password = getpass.getpass("Password: ")
>>> td_context = create_context(host=host, username=username, password=password)

# Create Google Cloud Platform (GCP) context to be used in TDApiClient
>>> bucket_name = input("GCS bucket name: ")
>>> bucket_path = input("GCS bucket path (without bucket name): ")
>>> td_auth_obj = getpass.getpass("GCP Teradata auth object name: ")
>>> project_id = input("GCP project ID: ")
>>> region = input("GCP Region: ")
>>> google_app_cred = input("Local path to Google credentials JSON file: ")

>>> os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = google_app_cred
>>> os.environ["GCP_REGION"] = region
>>> os.environ["GCP_PROJECT_ID"] = project_id
>>> os.environ["GCP_TD_AUTH_OBJ"] = td_auth_obj

>>> gcp_context = create_tdapi_context("gcp", gcp_bucket_name=bucket_name, gcp_bucket_path=bucket_path)
# Create TDApiClient instance
>>> td_apiclient = TDApiClient(gcp_context)

# Load data in Teradata tables
# (training data is the same as test data for the purposes of this demo)
>>> load_example_data("naivebayes", "nb_iris_input_train")
>>> df = DataFrame("nb_iris_input_train")

# Create Vertex AI training job
>>> TRAINING_IMAGE = "us-docker.pkg.dev/vertex-ai/training/scikit-learn-cpu.0-23:latest"
>>> PREDICTION_IMAGE = "us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest"
>>> job = td_apiclient.CustomTrainingJob(
        display_name="tdapiclient-custom-demo",
        script_path="train.py",
        container_uri=TRAINING_IMAGE,
        requirements=["gcsfs", "nyoka"],
        model_serving_container_image_uri=PREDICTION_IMAGE
        )

# Obtain trained model
>>> model = job.fit(
        df,
        replica_count=1,
        model_display_name="tdapiclient-custom-demo"
        )

# Deploy model to a Vertex AI online endpoint
>>> predictor = job.deploy(
        model,
        "vx-endpoint",
        vertex_kwargs={"machine_type": "n1-standard-4"}
        )

# Predict with UDF and client options
>>> df_test = df.drop(["id", "species"], axis=1)
>>> vertex_prediction_obj = predictor.predict(df_test, mode="client")
>>> td_output = predictor.predict(df_test, mode="udf", content_type="json")

Documentation

General product information, including installation instructions, is available in the Teradata Documentation website.

License

Use of the Teradata Python Package is governed by the TERADATA API LICENSE AGREEMENT. After installation, the LICENSE and LICENSE-3RD-PARTY files will be located in the tdapiclient directory of the Python installation directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

tdapiclient-1.4.0.1-py3-none-any.whl (221.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page