SKT package
Project description
SKT Package
This is highly site dependent package. Resources are abstracted into package structure.
Usage
Hive metastore
from skt.ye import get_hms
c = get_hms()
c.get_partition_names("db", "table")
c.close()
Hash and unhash
from skt.lake import hash_s
from skt.lake import unhash_s
unhashed_list = ['0000000000']
hashed_list = hash_s(unhashed_list)
unhash_s(hashed_list)
Get pandas dataframe from parquet file in hdfs
from skt.ye import parquet_to_pandas
pandas_df = parquet_to_pandas(hdfs_path)
Save pandas dataframe as parquet in hdfs
from skt.ye import get_spark
from skt.ye import pandas_to_parquet
spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark) # we need spark for this operation
spark.stop()
Work with spark
from skt.ye import get_spark
spark = get_spark()
# do with spark session
spark.stop()
Work with spark-bigquery-connector
# SELECT
from skt.gcp import bq_table_to_pandas
pandas_df = bq_table_to_pandas("dataset", "table_name", ["col_1", "col_2"], "2020-01-01", "svc_mgmt_num is not null")
# INSERT
from skt.gcp import pandas_to_bq_table
pandas_to_bq_table(pandas_df, "dataset", "table_name", "2020-03-01")
Send slack message
from skt.ye import slack_send
text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
# Send dataframe as text
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
slack_send(text=df, username=username, channel=channel, dataframe=True)
Get bigquery client
from skt.gcp import get_bigquery_client
bq = get_bigquery_client()
bq.query(query)
IPython BigQuery Magic
from skt.gcp import import_bigquery_ipython_magic
import_bigquery_ipython_magic()
query_params = {
"p_1": "v_1",
"dataset": "mnoai",
}
%% bq --params $query_params
SELECT c_1
FROM {dataset}.user_logs
WHERE c_1 = @p_1
Use NES CLI
nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3
Use github util
from skt.ye import get_github_util
g = get_github_util
# query graphql
res = g.query_gql(graph_ql)
# get file in github repository
byte_object = g.download_from_git(github_url_path)
Installation
$ pip install skt --upgrade
If you would like to install submodules for AIR
$ pip install skt[air] --upgrade
Develop
Create issue first and follow the GitHub flow https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/github-flow
AIPS EDA tools
OVERVIEW
- Modeling EDA 시 활용할 수 있는 기능의 공통 module
-
Modules
1) EDA (Nuemric / Categorical variable) <br> <br>
1) EDA
1. Numeric variable EDA
- def numeric_eda_plot
Numeric feature에 대한 EDA Plot function
Args. :
- df : Pandas DataFrame 형태의 EDA대상 데이터
- feature_list : EDA 대상 feature list (df의 columns)
- label_col : Label(or Hue) column
- cols : Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
- n_samples : Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
- plot_type : density or box (default = 'density')
- stat_yn : 기초 통계량 출력여부 (mean / min / max / 1q / 3q) (default : False)
- figsize : (default : (7,4))
Returns :
matplotlib.pyplot object
Example :
fig = numeric_eda_plot(df, ['age'], 'answer', cols = 1, n_samples = 10000, plot_type='density', stat_yn=True, figsize = (7,4))
fig
if want to Save the EDA images,
fig.savefig('filename')
2. Categorical variable EDA
- def categorical_eda_plot
Categorical feature에 대한 EDA Plot function
Args. :
- df : Pandas DataFrame 형태의 EDA대상 데이터
- feature_list : EDA 대상 feature list (df의 columns)
- label_col : Label(or Hue) column
- cols : Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
- n_samples : Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
- figsize : (default : (7,4))
Returns :
matplotlib.pyplot object
Example :
Example :
fig = categorical_eda_plot(df, ['sex_cd'], 'answer', cols = 1, n_samples = 10000, figsize = (7,4))
fig
if want to Save the EDA images,
fig.savefig('filename')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skt-1.0.11.tar.gz.
File metadata
- Download URL: skt-1.0.11.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70c0cb59945229d0d2dce5c5b3a318a1ff5f28a52dd1187693f7eaef754796f0
|
|
| MD5 |
261c22570e61db1d1ece3bd29d5ccbd6
|
|
| BLAKE2b-256 |
9a8ba77fdcc5828b7948ecf4a3f0e4ff3acd67a5813b496e995b41ec9fcf4592
|
File details
Details for the file skt-1.0.11-py3-none-any.whl.
File metadata
- Download URL: skt-1.0.11-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbe8f370baae15cc48c5f0d91f84f8c37f7c21d8a8db1d9c75ab6bca62aa36fa
|
|
| MD5 |
113a0d9d7b5460d5ea1a77a78143ba5e
|
|
| BLAKE2b-256 |
1600a799f10f595b9bafaf76c58f882eaea628bf5805b1a73690c9b0506a5334
|