Teradata Consulting Python Client Extensions
Project description
Teradata ML Extensions
Extensions to the core teradataml library by Teradata Consulting to aid in field development work around BYOM, STO, RTO and AnalyticOps solutions.
Installation
You can install via pip.
pip install tdextension
Usage
You must use the same version of python on your client side as is used in Teradata (3.6+ at the time of writing). The reason for this is due to differences in serialization between versions of python (e.g. between 3.5 and 3.6).
from teradataml.dataframe.dataframe import DataFrame
from tdextensions.distributed import DistDataFrame, DistMode
from teradataml import create_context
import pandas as pd
import numpy as np
pd.options.display.max_colwidth = 250
engine = create_context(host="localhost", username="ivsm_user", password="ivsm_user")
A simple map row example where we multiple the value of two columns on a row by row basis
def my_fun(row):
return np.array([row.idx, row.sepal_length * row.sepal_width])
df = DistDataFrame("iris_train", dist_mode=DistMode.STO, sto_id="my_dumb_map")
df = df.map(lambda row: my_fun(row),
returns=[["idx", "INTEGER"], ["my_derived_col", "INTEGER"]])
df.head()
A more advanced example where we train a different model for each partition of a dataset
from sklearn.ensemble import RandomForestClassifier
import base64
import dill
def train(partition):
X = partition[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = partition[['species']]
clf = RandomForestClassifier()
clf.fit(X, y.values.ravel())
return np.array([[partition.species.iloc[0], "my_model_id", base64.b64encode(dill.dumps(clf))]])
df = DistDataFrame("iris_train", dist_mode=DistMode.STO, sto_id="my_model_train")
df = df.map_partition(lambda partition: train(partition),
partition_by="species",
returns=[["partition_id", "VARCHAR(255)"],
["model_id", "VARCHAR(255)"],
["model_artefact", "CLOB"]])
df.to_pandas().head()
Permissions
SET SESSION SEARCHUIFDBPATH = <database>;
GRANT EXECUTE procedure on <db> to <user>;
GRANT EXECUTE procedure on SYSUIF to <user>;
GRANT CREATE external procedure on <db> to <user>;
GRANT EXECUTE FUNCTION ON TD_SYSFNLIB.SCRIPT to <user>;
GRANT EXECUTE ON SYSUIF.DEFAULT_AUTH TO <user>;
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tdextensions-1.0.0rc1.tar.gz
.
File metadata
- Download URL: tdextensions-1.0.0rc1.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0f1143501f491522a587f11e86880e9fa6219cac5c58bfa37d3bedf1daffd7e |
|
MD5 | f0936fc9f852283077dd94da6bc13797 |
|
BLAKE2b-256 | a7a836c9fc4532ca1c29e9f0d25f73f7b5d8fa3b7715d2864eee4093b8044515 |
File details
Details for the file tdextensions-1.0.0rc1-py3-none-any.whl
.
File metadata
- Download URL: tdextensions-1.0.0rc1-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff541707fc959de7920ef550152449b6d3fba99b9848b2403756676ab612301d |
|
MD5 | 0e551be38da1b66077825415cc8a93d0 |
|
BLAKE2b-256 | e5bc373457bbbd9f3756e2d1aef8c3a9dc754df34bd33f2ce9b224e3dadac18f |