Skip to main content

SKT package

Project description

SKT Package

Actions Status

This is highly site dependent package. Resources are abstracted into package structure.

Usage

Hive metastore

from skt.ye import get_hms

c = get_hms()
c.get_partition_names("db", "table")
c.close()

Hash and unhash

from skt.lake import hash_s
from skt.lake import unhash_s

unhashed_list = ['0000000000']
hashed_list = hash_s(unhashed_list)
unhash_s(hashed_list)

Get pandas dataframe from parquet file in hdfs

from skt.ye import parquet_to_pandas

pandas_df = parquet_to_pandas(hdfs_path)

Save pandas dataframe as parquet in hdfs

from skt.ye import get_spark
from skt.ye import pandas_to_parquet

spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark)  # we need spark for this operation
spark.stop()

Work with spark

from skt.ye import get_spark

spark = get_spark()
# do with spark session
spark.stop()

Work with spark-bigquery-connector

# SELECT
from skt.gcp import bq_table_to_pandas

pandas_df = bq_table_to_pandas("dataset", "table_name", ["col_1", "col_2"], "2020-01-01", "svc_mgmt_num is not null")
# INSERT 
from skt.gcp import pandas_to_bq_table

pandas_to_bq_table(pandas_df, "dataset", "table_name", "2020-03-01")

Send slack message

from skt.ye import slack_send

text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
# Send dataframe as text
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
slack_send(text=df, username=username, channel=channel, dataframe=True)

Get bigquery client

from skt.gcp import get_bigquery_client

bq = get_bigquery_client()
bq.query(query)

IPython BigQuery Magic

from skt.gcp import import_bigquery_ipython_magic

import_bigquery_ipython_magic()

query_params = {
    "p_1": "v_1",
    "dataset": "mnoai",
}
%% bq --params $query_params

SELECT c_1 
FROM {dataset}.user_logs
WHERE c_1 = @p_1

Use NES CLI

nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3

Use github util

from skt.ye import get_github_util

g = get_github_util
# query graphql
res = g.query_gql(graph_ql)
# get file in github repository
byte_object = g.download_from_git(github_url_path)

Installation

$ pip install skt --upgrade

If you would like to install submodules for AIR

$ pip install skt[air] --upgrade

Develop

Create issue first and follow the GitHub flow https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/github-flow

AIPS EDA tools

OVERVIEW

  • Modeling EDA 시 활용할 수 있는 기능의 공통 module
  • Modules

      1) EDA (Nuemric / Categorical variable)
         <br>
         <br>
    

1) EDA

1. Numeric variable EDA

  • def numeric_eda_plot
    Numeric feature에 대한 EDA Plot function
    
    Args. :
        - df           :   Pandas DataFrame 형태의 EDA대상 데이터
        - feature_list :   EDA 대상 feature list (df의 columns)
        - label_col    :   Label(or Hue) column
        - cols         :   Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
        - n_samples    :   Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
        - plot_type    :   density or box (default = 'density')
        - stat_yn      :   기초 통계량 출력여부 (mean / min / max / 1q / 3q) (default : False)
        - figsize      :   (default : (7,4))
    
    Returns : 
        matplotlib.pyplot object

    Example : 
        fig = numeric_eda_plot(df, ['age'], 'answer', cols = 1, n_samples = 10000, plot_type='density', stat_yn=True, figsize = (7,4))
        fig
        
        if want to Save the EDA images,
        fig.savefig('filename')

2. Categorical variable EDA

  • def categorical_eda_plot
    Categorical feature에 대한 EDA Plot function
    
    Args. :
        - df           :   Pandas DataFrame 형태의 EDA대상 데이터
        - feature_list :   EDA 대상 feature list (df의 columns)
        - label_col    :   Label(or Hue) column
        - cols         :   Multi-plot 시 grid column 개수 (row 개수는 feature_list에 따라 자동으로 결정 됨)
        - n_samples    :   Label 별 sampling 할 개수 (default = -1(전수 데이터로 EDA할 경우))
        - figsize      :   (default : (7,4))
    
    Returns : 
        matplotlib.pyplot object


    Example : 
        Example : 
        fig = categorical_eda_plot(df, ['sex_cd'], 'answer', cols = 1, n_samples = 10000, figsize = (7,4))
        fig
        
        if want to Save the EDA images,
        fig.savefig('filename')
    

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skt-1.0.12.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skt-1.0.12-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file skt-1.0.12.tar.gz.

File metadata

  • Download URL: skt-1.0.12.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for skt-1.0.12.tar.gz
Algorithm Hash digest
SHA256 eb795b56fc69dda0cf011494344c55bfea8ddd376926bcd4f5cb90d3f8077f63
MD5 e74d769f86dcdbf95df0270569cac7ff
BLAKE2b-256 c9ceebc802679d66e16ce2fe389b5da0be4443ba99d7fe1a9cc93b654744fa17

See more details on using hashes here.

File details

Details for the file skt-1.0.12-py3-none-any.whl.

File metadata

  • Download URL: skt-1.0.12-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for skt-1.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 3cd8c8bcad91297af455690e4db0c8d037e76b2da34cab6b3b66eea563423388
MD5 d40436e3b2e51809b8cccd173f4efe21
BLAKE2b-256 76b0105d23320391d225a496d4ca37b4228bfbc8f9c17f20227900c9f2c58487

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page