Skip to main content

SKT package

Project description

SKT Package

Actions Status

This is highly site dependent package. Resources are abstracted into package structure.

Usage

Execute hive query without fetch result

from skt.ye import hive_execute
hive_execute(ddl_or_ctas_query)

Fetch resultset from hive query

from skt.ye import hive_get_result
result_set = hive_get_result(select_query)

Get pandas dataframe from hive qeruy resultset

from skt.ye import hive_to_pandas
pandas_df = hive_to_pandas(hive_query)

Get pandas dataframe from parquet file in hdfs

from skt.ye import parquet_to_pandas
pandas_df = parquet_to_pandas(hdfs_path)

Save pandas dataframe as parquet in hdfs

from skt.ye import get_spark
from skt.ye import pandas_to_parquet
spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark) # we need spark for this operation
spark.stop()

Work with spark

from skt.ye import get_spark
spark = get_spark()
# do with spark session
spark.stop()

Send slack message

from skt.ye import slack_send
text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)

Get bigquery client

from skt.gcp import get_bigquery_client
bq = get_bigquery_client()
bq.query(query)

Access MLS

from skt.mls import set_model_name
from skt.mls import get_recent_model_path
from skt.ye import get_pkl_from_hdfs

set_model_name(COMM_DB, params)
path = get_recent_model_path(COMM_DB, model_key)
model = get_pkl_from_hdfs(f'{path})

Use NES CLI

nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3

Installation

$ pip install skt --upgrade

If you would like to install submodules for AIR

$ pip install skt[air] --upgrade

Develop

Create issue first and follow the GitHub flow https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/github-flow

AIPS EDA tools

OVERVIEW

  • Modeling EDA 시 활용할 수 있는 기능의 공통 module
  • Modules
      1. EDA (Nuemric / Categorical variable)


1) EDA

1. Numeric variable EDA

  • def numeric_eda_plot
    Numeric feature에 대한 EDA Plot function

    Args. :
        - df           :   Pandas DataFrame 형태의 EDA대상 데이터
        - feature_list :   EDA 대상 feature list (df의 columns)
        - label_col    :   Label(or Hue) column
        - cols         :   Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
        - n_samples    :   Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
        - plot_type    :   density or box (default = 'density')
        - stat_yn      :   기초 통계량 출력여부 (mean / min / max / 1q / 3q) (default : False)
        - figsize      :   (default : (7,4))

    Returns : 
        matplotlib.pyplot object

    Example : 
        fig = numeric_eda_plot(df, ['age'], 'answer', cols = 1, n_samples = 10000, plot_type='density', stat_yn=True, figsize = (7,4))
        fig

        if want to Save the EDA images,
        fig.savefig('filename')

2. Categorical variable EDA

  • def categorical_eda_plot
    Categorical feature에 대한 EDA Plot function

    Args. :
        - df           :   Pandas DataFrame 형태의 EDA대상 데이터
        - feature_list :   EDA 대상 feature list (df의 columns)
        - label_col    :   Label(or Hue) column
        - cols         :   Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
        - n_samples    :   Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
        - figsize      :   (default : (7,4))

    Returns : 
        matplotlib.pyplot object


    Example : 
        Example : 
        fig = categorical_eda_plot(df, ['sex_cd'], 'answer', cols = 1, n_samples = 10000, figsize = (7,4))
        fig

        if want to Save the EDA images,
        fig.savefig('filename')

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skt-0.1.20.tar.gz (7.9 kB view hashes)

Uploaded Source

Built Distribution

skt-0.1.20-py3-none-any.whl (10.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page