Generic ETL Pipeline Framework for Apache Spark

These details have not been verified by PyPI

Project links

Homepage

Project description

Deployers

HDFSDeployer

Deploy application build to HDFS via a bridge host

To create a deployer, here is the sample code:

bridge is a ssh hostname where you can run the hdfs dfs ... command
stage_dir is a temporary directory in bridge machine, for storing temporary files.

    deployer = HDFSDeployer({
        "bridge"   : "spnode1",
        "stage_dir": "/root/.stage_dir",
    })

To deploy an application, here is the sample code:

Frist parameter tells where is the application build. You need to build to this directory first
Second parameter tell where is the destination to deploy the application.

    deployer.deploy(
        "/mnt/DATA_DISK/projects/spark_etl/examples/myapp/build", 
        "/apps/myjob"
    )

Job Submitters

LivyJobSubmitters

To create a job submitter, here is the sample code:

service_url points to the livy endpoint
username, password is your livy username and password
bridge: is a ssh hostname, where you can run yarn logs -applicationId to get the application log

Here is an example:

    job_submitter = LivyJobSubmitter({
        "service_url": "http://10.0.0.11:60008/",
        "username": "root",
        "password": "foo",
        "bridge": "spnode1"
    })

To run the application, here is the sample:

first parameter is the deployment location. The deployer is responsible for the deployment.
/apps/myjob/build/1.0.0.1 resides in HDFS

    job_submitter.run(
        "/apps/myjob/build/1.0.0.1"
    )

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.130

Jun 5, 2023

0.0.129

May 30, 2023

0.0.128

May 30, 2023

0.0.127

May 26, 2023

0.0.126

May 26, 2023

0.0.125

May 26, 2023

0.0.124

May 26, 2023

0.0.123

May 24, 2023

0.0.122

Apr 24, 2023

0.0.121

Apr 24, 2023

0.0.120

Apr 24, 2023

0.0.119

Apr 24, 2023

0.0.118

Apr 24, 2023

0.0.117

Apr 22, 2023

0.0.116

Apr 21, 2023

0.0.115

Feb 24, 2023

0.0.114

May 2, 2022

0.0.113

May 2, 2022

0.0.112

May 2, 2022

0.0.110

Apr 27, 2022

0.0.109

Dec 10, 2021

0.0.108

Dec 8, 2021

0.0.107

Dec 6, 2021

0.0.106

Dec 6, 2021

0.0.105

Dec 5, 2021

0.0.104

Dec 5, 2021

0.0.103

Nov 2, 2021

0.0.102

Nov 1, 2021

0.0.101

Oct 12, 2021

0.0.100

Aug 28, 2021

0.0.99

Aug 12, 2021

0.0.98

Apr 17, 2021

0.0.97

Apr 15, 2021

0.0.96

Apr 12, 2021

0.0.95

Apr 8, 2021

0.0.94

Apr 8, 2021

0.0.93

Apr 7, 2021

0.0.92

Mar 17, 2021

0.0.91

Mar 9, 2021

0.0.90

Mar 9, 2021

0.0.89

Mar 9, 2021

0.0.88

Mar 8, 2021

0.0.87

Mar 8, 2021

0.0.86

Mar 7, 2021

0.0.85

Mar 3, 2021

0.0.82

Feb 19, 2021

0.0.81

Feb 18, 2021

0.0.80

Feb 18, 2021

0.0.79

Feb 18, 2021

0.0.78

Feb 18, 2021

0.0.77

Feb 18, 2021

0.0.76

Feb 18, 2021

0.0.75

Feb 12, 2021

0.0.71

Feb 10, 2021

0.0.70

Feb 10, 2021

0.0.69

Feb 10, 2021

0.0.68

Feb 10, 2021

0.0.51

Feb 8, 2021

0.0.49

Feb 8, 2021

0.0.46

Feb 3, 2021

0.0.38

Jan 2, 2021

0.0.37

Dec 30, 2020

0.0.31

Dec 29, 2020

0.0.30

Dec 29, 2020

0.0.21

Dec 15, 2020

0.0.11

Nov 25, 2020

0.0.10

Nov 18, 2020

0.0.9

Nov 9, 2020

0.0.8

Nov 6, 2020

0.0.7

Nov 6, 2020

0.0.6

Sep 25, 2020

0.0.5

Sep 21, 2020

This version

0.0.4

Jul 22, 2020

0.0.3

Jul 22, 2020

0.0.1

Jul 22, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spark-etl-0.0.4.tar.gz (9.3 kB view hashes)

Uploaded Jul 22, 2020 Source

Built Distribution

spark_etl-0.0.4-py2.py3-none-any.whl (15.1 kB view hashes)

Uploaded Jul 22, 2020 Python 2 Python 3

Hashes for spark-etl-0.0.4.tar.gz

Hashes for spark-etl-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`f834dd06744d519d41b584af138a95859c000a88a2a6f5a2854f937eb9b06aa2`
MD5	`a07ad98ecdb68e8051df342f814a207a`
BLAKE2b-256	`b48da84b26ff8130c1d85cdc322614f848307e406ec50b0d7e4d16714bcc3348`

Hashes for spark_etl-0.0.4-py2.py3-none-any.whl

Hashes for spark_etl-0.0.4-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`9933a3fef78489d77322ee2621324b35f48e8d14d228ea9a3470d94cfd4f4237`
MD5	`13baead30f24c9662cb4d695f3696525`
BLAKE2b-256	`69a20dac13f54f785a4ef22ddcaf3d567201476b2f9c290fac230d9c9d2459db`