Pyspark bridge to dataos
Project description
Pyflare
What it does:
Pyflare is written ground up to provide native support in Python versions 3.x. This sdk makes it easy to integrate your Python application, library, or script with the data ecosystem.
SDK abstracts out the challenges/complexity around data flow. User can just focus on data transformations and business logic.
Steps to install
- pip install pyflare-0.0.12-py3-none-any.whl
That's it. Enjoy!!
How to use:
Define mandatory fields
sparkConf [optional] :
Define spark conf as per your need.
token:
Provide Dataos API token.
DATAOS_FQDN:
Provide dataos fully qualified domain name.
with_depot():
Provide dataos address to be usd in current session with correct acl.
load():
method to load data from source.
save():
method to write data to sink.
Samples:
from pyflare.sdk import load, save, session_builder
# Define your spark conf params here
sparkConf = [("spark.app.name", "Dataos Sdk Spark App"), ("spark.master", "local[*]"), ("spark.executor.memory", "4g"),
("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.25.1,"
"com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.17,"
"net.snowflake:spark-snowflake_2.12:2.11.0-spark_3.3")
]
# Provide dataos token here
token = "bWF5YW5rLjkxYzZiNDQ3LWM3ZWYtNGZhYS04YmEwLWMzNjk3MzQ1MTQyNw=="
# provide dataos fully qualified domain name
DATAOS_FQDN = "sunny-prawn.dataos.app"
# initialize pyflare session
spark = session_builder.SparkSessionBuilder() \
.with_spark_conf(sparkConf) \
.with_user_apikey(token) \
.with_dataos_fqdn(DATAOS_FQDN) \
.with_depot("dataos://icebase:retail/city", "r") \
.with_depot("dataos://sanitysnowflake:public/customer", "w") \
.build_session()
# load() method will read dataset city from the source and return a governed dataframe
df_city = load(name="dataos://icebase:retail/city", format="iceberg")
# perform, required transformation as per business logic
df_city = df_city.drop("__metadata")
# save() will write transformed dataset to the sink
save(name="dataos://sanitysnowflake:public/customer", mode="overwrite", dataframe=df_city, format="snowflake")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataos-pyflare-0.0.1.tar.gz
(20.3 kB
view hashes)
Built Distribution
Close
Hashes for dataos_pyflare-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea9c87b24ae52a9e5625602e3b89dfd93e77a986fa80bb7a6793f56f63c14103 |
|
MD5 | f9a8d19e34b2b22ec7f1bda498e5ac32 |
|
BLAKE2b-256 | 6e0d3a8df36804a8c04d05342c9b1b8fdb2b54209c998d347aa00c820279c821 |