Skip to main content

Module for converting sklearn model to Teradata Vantage model

Project description

sklearn2vantage is a Python module for converting sklearn model to Teradata Vantage model table.

This module has 2 feature. One is converting scikit-learn model to Teradata Vantage model and another is uploading pandas dataframe to Teradata.

Installation

Dependencies

sklearn2vantage requires:

  • Python

  • NumPy

  • pandas

  • SQLAlchemy

  • scikit-learn

  • paramiko

  • scp

  • teradata

  • sqlalchemy-teradata

  • teradatasql

  • teradatasqlalchemy

Supported model

Following models are supported.

scikit-learn

Teradata Vantage

RandomForestClassifier

DecisionForestPredict

RandomForestRegressor

DecisionForestPredict

GradientBoostRegressor

DecisionForestPredict

LinearRegression

GLMPredict

Lasso

GLMPredict

Ridge

GLMPredict

Linear

GLMPredict

LogisticRegression

GLMPredict

GaussianNB

NaiveBayesPredict

CategoricalNB

NaiveBayesPredict

DecisionTreeClassifier

DecisionTreePredict

DecusionTreeRegressor

DecisionTreePredict

Some models in statsmodels are also supported.

statsmodels

Teradata Vantage

Logit

GLMPredict

OLS

GLMPredict

User installation

pip install sklearn2vantage

or

conda install sklearn2vantage -c temporary-recipes

Example: conveting model

import sklearn2vantage as s2v
import pandas as pd
from sqlalchemy import create_engine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

engine = create_engine("teradata://dbc:dbc@173.168.56.128:1025/tdwork")

df = pd.read_sql_query("select * from some_data sample 50000", engine)
X = df.drop("target", axis=1)
y = df.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

rf_clf = RandomForestClassifier()
rf_clf.fit(X_train, y_train)

rf_clf_table = \
  s2v.make_model_table_forest(rf_clf, X_train.columns,
                              ['setosa', 'versicolor', 'virginica'])

s2v.load_model_forest(rf_clf_table, engine, "rf_clf_table")
pd.read_sql_query("""
  select * from DecisionForestPredict (
    on iris partition by any
    on rf_clf_table as ModelTable DIMENSION
    USING
    NumerixInputs ('sepal_length', 'sepal_width',
                  'petal_length', 'petal_width')
    IdColumn ('id')
    Accumulate ('species')
    Detailed ('false')
) as dt""", engine)

For further usage, please see HowToUse.ipynb.

Example: data loading

import pandas as pd
import sklearn2vantage as s2v
from sqlalchemy import create_engine
engine = create_engine("teradata://dbc:dbc@173.168.56.128:1025/tdwork")
df_titanic = pd.read_csv("titanic/train.csv").set_index("PassengerId")
s2v.tdload_df(df_titanic, engine, tablename="titanic_train",
              ifExists="replace", ssh_ip="173.168.56.128",
              ssh_username="root", ssh_password="root")

For further usage, please see HowToUseDataloader.ipynb.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn2vantage-0.1.6.tar.gz (10.6 kB view details)

Uploaded Source

File details

Details for the file sklearn2vantage-0.1.6.tar.gz.

File metadata

  • Download URL: sklearn2vantage-0.1.6.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.8.0

File hashes

Hashes for sklearn2vantage-0.1.6.tar.gz
Algorithm Hash digest
SHA256 7c56ff2314c2ead36705b1e3b7d5c84f1359c243988845a694accb565fe4fe4c
MD5 54f8c3a7a445a5e97ec7b88190ee7327
BLAKE2b-256 f136493c009d174d2fdb64e9d40b4a6bcd17284ba4ab5e303fc83562dfd04ea6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page