SKT package
Project description
SKT Package
This is highly site dependent package. Resources are abstracted into package structure.
Usage
Execute hive query without fetch result
from skt.ye import hive_execute
hive_execute(ddl_or_ctas_query)
Fetch resultset from hive query
from skt.ye import hive_get_result
result_set = hive_get_result(select_query)
Get pandas dataframe from hive qeruy resultset
from skt.ye import hive_to_pandas
pandas_df = hive_to_pandas(hive_query)
Get pandas dataframe from parquet file in hdfs
from skt.ye import parquet_to_pandas
pandas_df = parquet_to_pandas(hdfs_path)
Save pandas dataframe as parquet in hdfs
from skt.ye import get_spark
from skt.ye import pandas_to_parquet
spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark) # we need spark for this operation
spark.stop()
Work with spark
from skt.ye import get_spark
spark = get_spark()
# do with spark session
spark.stop()
Send slack message
from skt.ye import slack_send
text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
Installation
$ pip install skt --upgrade
If you would like to install submodules for AIR
$ pip install skt[air] --upgrade
AIPS EDA tools
OVERVIEW
- Modeling EDA 시 활용할 수 있는 기능의 공통 module
- Modules
-
- EDA (Nuemric / Categorical variable)
-
1) EDA
1. Numeric variable EDA
- def numeric_eda_plot
Numeric feature에 대한 EDA Plot function
Args. :
- df : Pandas DataFrame 형태의 EDA대상 데이터
- feature_list : EDA 대상 feature list (df의 columns)
- label_col : Label(or Hue) column
- cols : Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
- n_samples : Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
- plot_type : density or box (default = 'density')
- stat_yn : 기초 통계량 출력여부 (mean / min / max / 1q / 3q) (default : False)
- figsize : (default : (7,4))
Returns :
matplotlib.pyplot object
Example :
fig = numeric_eda_plot(df, ['age'], 'answer', cols = 1, n_samples = 10000, plot_type='density', stat_yn=True, figsize = (7,4))
fig
if want to Save the EDA images,
fig.savefig('filename')
2. Categorical variable EDA
- def categorical_eda_plot
Categorical feature에 대한 EDA Plot function
Args. :
- df : Pandas DataFrame 형태의 EDA대상 데이터
- feature_list : EDA 대상 feature list (df의 columns)
- label_col : Label(or Hue) column
- cols : Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
- n_samples : Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
- figsize : (default : (7,4))
Returns :
matplotlib.pyplot object
Example :
Example :
fig = categorical_eda_plot(df, ['sex_cd'], 'answer', cols = 1, n_samples = 10000, figsize = (7,4))
fig
if want to Save the EDA images,
fig.savefig('filename')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
skt-0.1.13.tar.gz
(6.6 kB
view hashes)
Built Distribution
skt-0.1.13-py3-none-any.whl
(8.4 kB
view hashes)