SKT package
Project description
SKT Package
This is highly site dependent package. Resources are abstracted into package structure.
Usage
Hive metastore
from skt.ye import get_hms
c = get_hms()
c.get_partition_names("db", "table")
c.close()
Hash and unhash
from skt.lake import hash_s
from skt.lake import unhash_s
unhashed_list = ['0000000000']
hashed_list = hash_s(unhashed_list)
unhash_s(hashed_list)
Execute hive query without fetch result
from skt.ye import hive_execute
hive_execute(ddl_or_ctas_query)
Fetch resultset from hive query
from skt.ye import hive_get_result
result_set = hive_get_result(select_query)
Get pandas dataframe from hive qeruy resultset
from skt.ye import hive_to_pandas
pandas_df = hive_to_pandas(hive_query)
Get pandas dataframe from parquet file in hdfs
from skt.ye import parquet_to_pandas
pandas_df = parquet_to_pandas(hdfs_path)
Save pandas dataframe as parquet in hdfs
from skt.ye import get_spark
from skt.ye import pandas_to_parquet
spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark) # we need spark for this operation
spark.stop()
Work with spark
from skt.ye import get_spark
spark = get_spark()
# do with spark session
spark.stop()
Work with spark-bigquery-connector
# SELECT
from skt.gcp import bq_table_to_pandas
pandas_df = bq_table_to_pandas("dataset", "table_name", ["col_1", "col_2"], "2020-01-01", "svc_mgmt_num is not null")
# INSERT
from skt.gcp import pandas_to_bq_table
pandas_to_bq_table(pandas_df, "dataset", "table_name", "2020-03-01")
Send slack message
from skt.ye import slack_send
text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
# Send dataframe as text
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
slack_send(text=df, username=username, channel=channel, dataframe=True)
Get bigquery client
from skt.gcp import get_bigquery_client
bq = get_bigquery_client()
bq.query(query)
IPython BigQuery Magic
from skt.gcp import import_bigquery_ipython_magic
import_bigquery_ipython_magic()
query_params = {
"p_1": "v_1",
"dataset": "mnoai",
}
%% bq --params $query_params
SELECT c_1
FROM {dataset}.user_logs
WHERE c_1 = @p_1
Access MLS
from skt.mls import set_model_name
from skt.mls import get_recent_model_path
from skt.ye import get_pkl_from_hdfs
set_model_name(COMM_DB, params)
path = get_recent_model_path(COMM_DB, model_key)
model = get_pkl_from_hdfs(f'{path})
MLS Model Registry (Upload model_binary(model.tar.gz) / model_meta(model.json) to AWS S3 from YE)
from skt.mls import save_model
# model object generated by LightGBM or XGBoost
model
# model name
model_name = 'sample_model'
# model version
model_version = 'v1'
# AWS ENV in 'stg / prd / dev' (default is 'stg')
aws_env = 'stg'
# List of features used in ML Model in string type (only for XGBoost model_type)
feature_list = ['feature_1', 'feature_2', 'feature_3']
# Force to overwrite model files on S3 if exists (default is False)
force = False
save_model(model, model_name, model_version, aws_env, force)
MLS meta_table & meta_table_item related methods
from skt.mls import get_meta_table
from skt.mls import create_meta_table_item
from skt.mls import update_meta_table_item
from skt.mls import get_meta_table_item
from skt.mls import meta_table_to_pandas
from skt.mls import pandas_to_meta_table
# Get a meta_table info
get_meta_table(meta_table_name, aws_env, edd)
# Create a meta_item
create_meta_table_item(meta_table_name, item_name, item_dict, aws_env, edd)
# Update a meta_item
update_meta_table_item(meta_table_name, item_name, item_dict, aws_env, edd)
# Get a meta_item
get_meta_table_item(meta_table_name, item_name, aws_env, edd)
# Get a meta_table as pandas dataframe
meta_table_to_pandas(meta_table_name, aws_env, edd)
# Update pandas dataframe to meta_table
pandas_to_meta_table(method, meta_table_name, dataframe, key, values, aws_env, edd)
# For the detal, use ?{method} to get detailed info (ex. ?get_meta_table)
# For the user of EDD, must set edd=True
MLS model_meta related methods
(*Need to set user for the ml_model)
from skt.mls import get_ml_model
from skt.mls import create_meta_table_item
from skt.mls import update_meta_table_item
# Get a ml_model
get_ml_model(user, model_name, model_version, aws_env, edd)
# Get a model_meta of ml_model
get_ml_model_meta(user, model_name, model_version, aws_env, edd)
# Update or Create meta_item(s)
update_ml_model_meta(user, model_name, model_version, model_meta_dict, aws_env, edd)
# For the detal, use ?{method} to get detailed info (ex. ?get_ml_model)
# For the user of EDD, must set edd=True
Use NES CLI
nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3
Use github util
from skt.ye import get_github_util
g = get_github_util
# query graphql
res = g.query_gql(graph_ql)
# get file in github repository
byte_object = g.download_from_git(github_url_path)
Installation
$ pip install skt --upgrade
If you would like to install submodules for AIR
$ pip install skt[air] --upgrade
Develop
Create issue first and follow the GitHub flow https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/github-flow
AIPS EDA tools
OVERVIEW
- Modeling EDA 시 활용할 수 있는 기능의 공통 module
- Modules
-
- EDA (Nuemric / Categorical variable)
-
1) EDA
1. Numeric variable EDA
- def numeric_eda_plot
Numeric feature에 대한 EDA Plot function
Args. :
- df : Pandas DataFrame 형태의 EDA대상 데이터
- feature_list : EDA 대상 feature list (df의 columns)
- label_col : Label(or Hue) column
- cols : Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
- n_samples : Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
- plot_type : density or box (default = 'density')
- stat_yn : 기초 통계량 출력여부 (mean / min / max / 1q / 3q) (default : False)
- figsize : (default : (7,4))
Returns :
matplotlib.pyplot object
Example :
fig = numeric_eda_plot(df, ['age'], 'answer', cols = 1, n_samples = 10000, plot_type='density', stat_yn=True, figsize = (7,4))
fig
if want to Save the EDA images,
fig.savefig('filename')
2. Categorical variable EDA
- def categorical_eda_plot
Categorical feature에 대한 EDA Plot function
Args. :
- df : Pandas DataFrame 형태의 EDA대상 데이터
- feature_list : EDA 대상 feature list (df의 columns)
- label_col : Label(or Hue) column
- cols : Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
- n_samples : Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
- figsize : (default : (7,4))
Returns :
matplotlib.pyplot object
Example :
Example :
fig = categorical_eda_plot(df, ['sex_cd'], 'answer', cols = 1, n_samples = 10000, figsize = (7,4))
fig
if want to Save the EDA images,
fig.savefig('filename')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file skt-0.2.109.tar.gz
.
File metadata
- Download URL: skt-0.2.109.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95655f4b227c9986dd87219eaa940b1902939b6e4961c8be5300c78e954fdd51 |
|
MD5 | 2a9c96b181ea1c478488dd4b4f988d29 |
|
BLAKE2b-256 | 67710b0330ab9ab64d4f9c5634dddc84b17cc1345960613299b88c1f008d278e |
File details
Details for the file skt-0.2.109-py3-none-any.whl
.
File metadata
- Download URL: skt-0.2.109-py3-none-any.whl
- Upload date:
- Size: 30.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c9d559b04336b680889d77e7f3a55d567f1dec19dc5e59142ca1266b7172b74 |
|
MD5 | 7d6197eda9a08ba957f2557289021b79 |
|
BLAKE2b-256 | 29e76ff11a9a43f359c28ae2931359d715d8c75eb519c3365423f8287d39ccd4 |