Skip to main content

SHINSEGAE DataFabric Python Package

Project description

SHINSEGAE DataFabric Python Package

Linter && Formatting Publish to TestPyPI Publish to PyPI

This is highly site dependent package. Resources are abstracted into package structure.

Usage

Get pandas dataframe from parquet file in hdfs

from pydatafabric.ye import parquet_to_pandas

pandas_df = parquet_to_pandas(hdfs_path)

Save pandas dataframe as parquet in hdfs

from pydatafabric.ye import get_spark
from pydatafabric.ye import pandas_to_parquet

spark = get_spark()
pandas_to_parquet(pandas_df, hdfs_path, spark)  # we need spark for this operation
spark.stop()

Work with spark

from pydatafabric.ye import get_spark

spark = get_spark()
# do with spark session
spark.stop()

Work with spark-bigquery-connector

# SELECT
from pydatafabric.gcp import bq_table_to_pandas

pandas_df = bq_table_to_pandas("dataset", "table_name", ["col_1", "col_2"], "2020-01-01", "cust_id is not null")
# INSERT 
from pydatafabric.gcp import pandas_to_bq_table

pandas_to_bq_table(pandas_df, "dataset", "table_name", "2022-02-22")

Send slack message

from pydatafabric.ye import slack_send

text = 'Hello'
username = 'airflow'
channel = '#leavemealone'
slack_send(text=text, username=username, channel=channel)
# Send dataframe as text
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
slack_send(text=df, username=username, channel=channel, dataframe=True)

Get bigquery client

from pydatafabric.gcp import get_bigquery_client

bq = get_bigquery_client(project="emart-datafabric")
bq.query(query)

IPython BigQuery Magic

from pydatafabric.gcp import import_bigquery_ipython_magic

import_bigquery_ipython_magic()

query_params = {
    "p_1": "v_1",
    "dataset": "common_dev",
}
%% bq --params $query_params

SELECT c_1 
FROM {dataset}.user_logs
WHERE c_1 = @p_1

Use NES CLI

nes input_notebook_url -p k1 v1 -p k2 v2 -p k3 v3

Use github util

from pydatafabric.ye import get_github_util

g = get_github_util
# query graphql
res = g.query_gql(graph_ql)
# get file in github repository
byte_object = g.download_from_git(github_url_path)

Installation

$ pip install pydatafabric --upgrade

If you would like to install submodules for Emart Inc.

$ pip install pydatafabric[emart] --upgrade

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydatafabric-0.4.1.tar.gz (17.5 kB view hashes)

Uploaded Source

Built Distribution

pydatafabric-0.4.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page