Skip to main content

SHINSEGAE DataFabric Python Package

Project description

SHINSEGAE DataFabric Python Package

Linter && Formatting Publish to TestPyPI Publish to PyPI

This is highly site dependent package. Resources are abstracted into package structure.


Get pandas dataframe from parquet file in hdfs

from import parquet_to_pandas

pandas_df = parquet_to_pandas(hdfs_path)

Save pandas dataframe as parquet in hdfs

from import get_spark
from import pandas_to_parquet

spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark)  # we need spark for this operation

Work with spark

from import get_spark

spark = get_spark()
# do with spark session

Work with spark-bigquery-connector

from pydatafabric.gcp import bq_table_to_pandas

pandas_df = bq_table_to_pandas("dataset", "table_name", ["col_1", "col_2"], "2020-01-01", "cust_id is not null")
from pydatafabric.gcp import pandas_to_bq_table

pandas_to_bq_table(pandas_df, "dataset", "table_name", "2022-02-22")

Send slack message

from import slack_send

text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
# Send dataframe as text
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
slack_send(text=df, username=username, channel=channel, dataframe=True)

Get bigquery client

from pydatafabric.gcp import get_bigquery_client

bq = get_bigquery_client(project="emart-datafabric")

IPython BigQuery Magic

from pydatafabric.gcp import import_bigquery_ipython_magic


query_params = {
    "p_1": "v_1",
    "dataset": "common_dev",
%% bq --params $query_params

FROM {dataset}.user_logs
WHERE c_1 = @p_1


nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3

Use github util

from import get_github_util

g = get_github_util
# query graphql
res = g.query_gql(graph_ql)
# get file in github repository
byte_object = g.download_from_git(github_url_path)


$ pip install pydatafabric --upgrade

If you would like to install submodules for Emart Inc.

$ pip install pydatafabric[emart] --upgrade

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydatafabric-0.4.41.tar.gz (23.2 kB view hashes)

Uploaded source

Built Distribution

pydatafabric-0.4.41-py3-none-any.whl (25.2 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page