Module for converting sklearn model to Teradata Vantage model
Project description
sklearn2vantage is a Python module for converting sklearn model to Teradata Vantage model table.
This module has 2 feature. One is converting scikit-learn model to Teradata Vantage model and another is uploading pandas dataframe to Teradata.
Installation
Dependencies
sklearn2vantage requires:
Python
NumPy
pandas
SQLAlchemy
scikit-learn
paramiko
scp
teradata
sqlalchemy-teradata
teradatasql
teradatasqlalchemy
Supported model
Following models are supported.
scikit-learn |
Teradata Vantage |
---|---|
RandomForestClassifier |
DecisionForestPredict |
RandomForestRegressor |
DecisionForestPredict |
GradientBoostRegressor |
DecisionForestPredict |
LinearRegression |
GLMPredict |
Lasso |
GLMPredict |
Ridge |
GLMPredict |
Linear |
GLMPredict |
LogisticRegression |
GLMPredict |
GaussianNB |
NaiveBayesPredict |
CategoricalNB |
NaiveBayesPredict |
DecisionTreeClassifier |
DecisionTreePredict |
DecusionTreeRegressor |
DecisionTreePredict |
Some models in statsmodels are also supported.
statsmodels |
Teradata Vantage |
---|---|
Logit |
GLMPredict |
OLS |
GLMPredict |
User installation
pip install sklearn2vantage
or
conda install sklearn2vantage -c temporary-recipes
Example: conveting model
import sklearn2vantage as s2v import pandas as pd from sqlalchemy import create_engine from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier engine = create_engine("teradata://dbc:dbc@173.168.56.128:1025/tdwork") df = pd.read_sql_query("select * from some_data sample 50000", engine) X = df.drop("target", axis=1) y = df.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) rf_clf = RandomForestClassifier() rf_clf.fit(X_train, y_train) rf_clf_table = \ s2v.make_model_table_forest(rf_clf, X_train.columns, ['setosa', 'versicolor', 'virginica']) s2v.load_model_forest(rf_clf_table, engine, "rf_clf_table") pd.read_sql_query(""" select * from DecisionForestPredict ( on iris partition by any on rf_clf_table as ModelTable DIMENSION USING NumerixInputs ('sepal_length', 'sepal_width', 'petal_length', 'petal_width') IdColumn ('id') Accumulate ('species') Detailed ('false') ) as dt""", engine)
For further usage, please see HowToUse.ipynb.
Example: data loading
import pandas as pd import sklearn2vantage as s2v from sqlalchemy import create_engine engine = create_engine("teradata://dbc:dbc@173.168.56.128:1025/tdwork") df_titanic = pd.read_csv("titanic/train.csv").set_index("PassengerId") s2v.tdload_df(df_titanic, engine, tablename="titanic_train", ifExists="replace", ssh_ip="173.168.56.128", ssh_username="root", ssh_password="root")
For further usage, please see HowToUseDataloader.ipynb.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sklearn2vantage-0.1.9.tar.gz
.
File metadata
- Download URL: sklearn2vantage-0.1.9.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
bc3708d9abaa9ed9929cddd310a85b2ad4529182925530c843ababbb522920a3
|
|
MD5 |
7ef42c231b53f74b9e73bb234e9d8da6
|
|
BLAKE2b-256 |
35f6cd136f9ce7d94a601ba409d4b7cb1c32952995f825fcc99bf6bdbafe1249
|