Skip to main content

SKT package

Project description

SKT Package

Actions Status

This is highly site dependent package. Resources are abstracted into package structure.

Usage

Hive metastore

from skt.ye import get_hms

c = get_hms()
c.get_partition_names("db", "table")
c.close()

Hash and unhash

from skt.lake import hash_s
from skt.lake import unhash_s

unhashed_list = ['0000000000']
hashed_list = hash_s(unhashed_list)
unhash_s(hashed_list)

Execute hive query without fetch result

from skt.ye import hive_execute
hive_execute(ddl_or_ctas_query)

Fetch resultset from hive query

from skt.ye import hive_get_result
result_set = hive_get_result(select_query)

Get pandas dataframe from hive qeruy resultset

from skt.ye import hive_to_pandas
pandas_df = hive_to_pandas(hive_query)

Get pandas dataframe from parquet file in hdfs

from skt.ye import parquet_to_pandas
pandas_df = parquet_to_pandas(hdfs_path)

Save pandas dataframe as parquet in hdfs

from skt.ye import get_spark
from skt.ye import pandas_to_parquet
spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark) # we need spark for this operation
spark.stop()

Work with spark

from skt.ye import get_spark
spark = get_spark()
# do with spark session
spark.stop()

Work with spark-bigquery-connector

# SELECT
from skt.gcp import bq_table_to_pandas 
pandas_df = bq_table_to_pandas("dataset", "table_name", ["col_1", "col_2"], "2020-01-01", "svc_mgmt_num is not null")
# INSERT 
from skt.gcp import pandas_to_bq_table
pandas_to_bq_table(pandas_df, "dataset", "table_name", "2020-03-01")

Send slack message

from skt.ye import slack_send
text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
# Send dataframe as text
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
slack_send(text=df, username=username, channel=channel, dataframe=True)

Get bigquery client

from skt.gcp import get_bigquery_client
bq = get_bigquery_client()
bq.query(query)

IPython BigQuery Magic

from skt.gcp import import_bigquery_ipython_magic

import_bigquery_ipython_magic()

query_params = {
    "p_1": "v_1",
    "dataset": "mnoai",
}
%% bq --params $query_params

SELECT c_1 
FROM {dataset}.user_logs
WHERE c_1 = @p_1

Access MLS

from skt.mls import set_model_name
from skt.mls import get_recent_model_path
from skt.ye import get_pkl_from_hdfs

set_model_name(COMM_DB, params)
path = get_recent_model_path(COMM_DB, model_key)
model = get_pkl_from_hdfs(f'{path})

MLS Model Registry (Upload model_binary(model.tar.gz) / model_meta(model.json) to AWS S3 from YE)

from skt.mls import save_model

# model object generated by LightGBM or XGBoost
model

# model name
model_name = 'sample_model'
# model version
model_version = 'v1'
# AWS ENV in 'stg / prd / dev' (default is 'stg')
aws_env = 'stg'
# List of features used in ML Model in string type (only for XGBoost model_type)
feature_list = ['feature_1', 'feature_2', 'feature_3']
# Force to overwrite model files on S3 if exists (default is False)
force = False 

save_model(model, model_name, model_version, aws_env, force)

MLS meta_table & meta_table_item related methods

from skt.mls import get_meta_table
from skt.mls import create_meta_table_item
from skt.mls import update_meta_table_item
from skt.mls import get_meta_table_item
from skt.mls import meta_table_to_pandas
from skt.mls import pandas_to_meta_table

# Get a meta_table info
get_meta_table(meta_table_name, aws_env, edd)
# Create a meta_item
create_meta_table_item(meta_table_name, item_name, item_dict, aws_env, edd)
# Update a meta_item
update_meta_table_item(meta_table_name, item_name, item_dict, aws_env, edd)
# Get a meta_item
get_meta_table_item(meta_table_name, item_name, aws_env, edd)
# Get a meta_table as pandas dataframe
meta_table_to_pandas(meta_table_name, aws_env, edd)
# Update pandas dataframe to meta_table
pandas_to_meta_table(method, meta_table_name, dataframe, key, values, aws_env, edd)


# For the detal, use ?{method} to get detailed info (ex. ?get_meta_table)
# For the user of EDD, must set edd=True

MLS model_meta related methods
(*Need to set user for the ml_model)

from skt.mls import get_ml_model
from skt.mls import create_meta_table_item
from skt.mls import update_meta_table_item

# Get a ml_model
get_ml_model(user, model_name, model_version, aws_env, edd)
# Get a model_meta of ml_model
get_ml_model_meta(user, model_name, model_version, aws_env, edd)
# Update or Create meta_item(s)
update_ml_model_meta(user, model_name, model_version, model_meta_dict, aws_env, edd)

# For the detal, use ?{method} to get detailed info (ex. ?get_ml_model)
# For the user of EDD, must set edd=True

Use NES CLI

nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3

Use github util

from skt.ye import get_github_util
g = get_github_util
# query graphql
res = g.query_gql(graph_ql)
# get file in github repository
byte_object = g.download_from_git(github_url_path)

Installation

$ pip install skt --upgrade

If you would like to install submodules for AIR

$ pip install skt[air] --upgrade

Develop

Create issue first and follow the GitHub flow https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/github-flow

AIPS EDA tools

OVERVIEW

  • Modeling EDA 시 활용할 수 있는 기능의 공통 module
  • Modules
      1. EDA (Nuemric / Categorical variable)


1) EDA

1. Numeric variable EDA

  • def numeric_eda_plot
    Numeric feature에 대한 EDA Plot function

    Args. :
        - df           :   Pandas DataFrame 형태의 EDA대상 데이터
        - feature_list :   EDA 대상 feature list (df의 columns)
        - label_col    :   Label(or Hue) column
        - cols         :   Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
        - n_samples    :   Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
        - plot_type    :   density or box (default = 'density')
        - stat_yn      :   기초 통계량 출력여부 (mean / min / max / 1q / 3q) (default : False)
        - figsize      :   (default : (7,4))

    Returns : 
        matplotlib.pyplot object

    Example : 
        fig = numeric_eda_plot(df, ['age'], 'answer', cols = 1, n_samples = 10000, plot_type='density', stat_yn=True, figsize = (7,4))
        fig

        if want to Save the EDA images,
        fig.savefig('filename')

2. Categorical variable EDA

  • def categorical_eda_plot
    Categorical feature에 대한 EDA Plot function

    Args. :
        - df           :   Pandas DataFrame 형태의 EDA대상 데이터
        - feature_list :   EDA 대상 feature list (df의 columns)
        - label_col    :   Label(or Hue) column
        - cols         :   Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
        - n_samples    :   Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
        - figsize      :   (default : (7,4))

    Returns : 
        matplotlib.pyplot object


    Example : 
        Example : 
        fig = categorical_eda_plot(df, ['sex_cd'], 'answer', cols = 1, n_samples = 10000, figsize = (7,4))
        fig

        if want to Save the EDA images,
        fig.savefig('filename')

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skt-0.2.101.tar.gz (34.0 kB view hashes)

Uploaded Source

Built Distribution

skt-0.2.101-py3-none-any.whl (33.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page