SHINSEGAE DataFabric Python Package
Project description
SHINSEGAE DataFabric Python Package
This is highly site dependent package. Resources are abstracted into package structure.
Usage
Get pandas dataframe from parquet file in hdfs
from pydatafabric.ye import parquet_to_pandas
pandas_df = parquet_to_pandas(hdfs_path)
Save pandas dataframe as parquet in hdfs
from pydatafabric.ye import get_spark
from pydatafabric.ye import pandas_to_parquet
spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark) # we need spark for this operation
spark.stop()
Work with spark
from pydatafabric.ye import get_spark
spark = get_spark()
# do with spark session
spark.stop()
Work with spark-bigquery-connector
# SELECT
from pydatafabric.gcp import bq_table_to_pandas
pandas_df = bq_table_to_pandas("dataset", "table_name", ["col_1", "col_2"], "2020-01-01", "cust_id is not null")
# INSERT
from pydatafabric.gcp import pandas_to_bq_table
pandas_to_bq_table(pandas_df, "dataset", "table_name", "2022-02-22")
Send slack message
from pydatafabric.ye import slack_send
text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
# Send dataframe as text
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
slack_send(text=df, username=username, channel=channel, dataframe=True)
Get bigquery client
from pydatafabric.gcp import get_bigquery_client
bq = get_bigquery_client(project="emart-datafabric")
bq.query(query)
IPython BigQuery Magic
from pydatafabric.gcp import import_bigquery_ipython_magic
import_bigquery_ipython_magic()
query_params = {
"p_1": "v_1",
"dataset": "common_dev",
}
%% bq --params $query_params
SELECT c_1
FROM {dataset}.user_logs
WHERE c_1 = @p_1
Use NES CLI
nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3
Use github util
from pydatafabric.ye import get_github_util
g = get_github_util
# query graphql
res = g.query_gql(graph_ql)
# get file in github repository
byte_object = g.download_from_git(github_url_path)
Installation
$ pip install pydatafabric --upgrade
If you would like to install submodules for Emart Inc.
$ pip install pydatafabric[emart] --upgrade
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pydatafabric-0.4.39.tar.gz
(23.0 kB
view hashes)
Built Distribution
Close
Hashes for pydatafabric-0.4.39-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57bd6960efd674bf64250972e1ff2cc27f04426044afe67a190997b555e11761 |
|
MD5 | b84f84085b556f0ab698bec8427c3028 |
|
BLAKE2b-256 | 820acf58e9fd5da7d8bc12e7dd8661c84f66699038ed5c3cf46a33385de32d6d |