Skip to main content

Teradata Consulting Python Client Extensions

Project description

Teradata ML Extensions

Extensions to the core teradataml library by Teradata Consulting to aid in field development work around BYOM, STO, RTO and AnalyticOps solutions.

Installation

You can install via pip.

pip install tdextension

Usage

You must use the same version of python on your client side as is used in Teradata (3.6+ at the time of writing). The reason for this is due to differences in serialization between versions of python (e.g. between 3.5 and 3.6).

from teradataml.dataframe.dataframe import DataFrame
from tdextensions.distributed import DistDataFrame, DistMode
from teradataml import create_context
import pandas as pd
import numpy as np

pd.options.display.max_colwidth = 250

engine = create_context(host="localhost", username="ivsm_user", password="ivsm_user")

A simple map row example where we multiple the value of two columns on a row by row basis

def my_fun(row):
    return np.array([row.idx, row.sepal_length * row.sepal_width])

df = DistDataFrame("iris_train", dist_mode=DistMode.STO, sto_id="my_dumb_map")
df = df.map(lambda row: my_fun(row), 
            returns=[["idx", "INTEGER"], ["my_derived_col", "INTEGER"]])

df.head()

A more advanced example where we train a different model for each partition of a dataset

from sklearn.ensemble import RandomForestClassifier
import base64
import dill

def train(partition):
    X = partition[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
    y = partition[['species']]

    clf = RandomForestClassifier()
    clf.fit(X, y.values.ravel())

    return np.array([[partition.species.iloc[0], "my_model_id", base64.b64encode(dill.dumps(clf))]])

df = DistDataFrame("iris_train", dist_mode=DistMode.STO, sto_id="my_model_train")
df = df.map_partition(lambda partition: train(partition), 
                      partition_by="species", 
                      returns=[["partition_id", "VARCHAR(255)"], 
                               ["model_id", "VARCHAR(255)"],
                               ["model_artefact", "CLOB"]])
df.to_pandas().head()

Permissions

SET SESSION SEARCHUIFDBPATH = ivsm;
GRANT EXECUTE PROCEDURE on <db> to <user>;
GRANT EXECUTE PROCEDURE on SYSUIF to <user>;
GRANT CREATE EXTERNAL PROCEDURE on <db> to <user>;
GRANT EXECUTE FUNCTION ON TD_SYSFNLIB.SCRIPT to <user>;
GRANT EXECUTE ON  SYSUIF.DEFAULT_AUTH TO <user>;

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdextensions-1.0.0rc0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

tdextensions-1.0.0rc0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file tdextensions-1.0.0rc0.tar.gz.

File metadata

  • Download URL: tdextensions-1.0.0rc0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.18.4 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.5.5

File hashes

Hashes for tdextensions-1.0.0rc0.tar.gz
Algorithm Hash digest
SHA256 c58ce46ec888acf2e922401990097033b9ef22a8e4aafcab30a2db80450bb320
MD5 9aae96fe7dbb02cc83675248a36c7c36
BLAKE2b-256 e5de912c40eabe3062427b71c0558eeaf23943e19f03220cba06f7135b98ef93

See more details on using hashes here.

File details

Details for the file tdextensions-1.0.0rc0-py3-none-any.whl.

File metadata

  • Download URL: tdextensions-1.0.0rc0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.18.4 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.5.5

File hashes

Hashes for tdextensions-1.0.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 103dac1b0306a390e0a48ac2200ed802804ca18c5304eadad3516e714bc4678f
MD5 d9c24f55b0d89dcb392d0afa711d95fb
BLAKE2b-256 dfd6c940e1e01bfe6d5670152d559627efb7eb7e4c858d7b4995b1c3033831aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page