Generic ETL Pipeline Framework for Apache Spark
Project description
Deployers
HDFSDeployer
Deploy application build to HDFS via a bridge host
To create a deployer, here is the sample code:
bridge
is a ssh hostname where you can run thehdfs dfs ...
commandstage_dir
is a temporary directory inbridge
machine, for storing temporary files.
deployer = HDFSDeployer({
"bridge" : "spnode1",
"stage_dir": "/root/.stage_dir",
})
To deploy an application, here is the sample code:
- Frist parameter tells where is the application
build
. You need to build to this directory first - Second parameter tell where is the destination to deploy the application.
deployer.deploy(
"/mnt/DATA_DISK/projects/spark_etl/examples/myapp/build",
"/apps/myjob"
)
Job Submitters
LivyJobSubmitters
To create a job submitter, here is the sample code:
service_url
points to the livy endpointusername
,password
is your livy username and passwordbridge
: is a ssh hostname, where you can runyarn logs -applicationId
to get the application log
Here is an example:
job_submitter = LivyJobSubmitter({
"service_url": "http://10.0.0.11:60008/",
"username": "root",
"password": "foo",
"bridge": "spnode1"
})
To run the application, here is the sample:
- first parameter is the deployment location. The deployer is responsible for the deployment.
/apps/myjob/build/1.0.0.1
resides in HDFS
job_submitter.run(
"/apps/myjob/build/1.0.0.1"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spark-etl-0.0.3.tar.gz
(8.4 kB
view hashes)
Built Distribution
Close
Hashes for spark_etl-0.0.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b00702193051bc189016800bc69b195cad336b734c73b12b0ac22268e0cb105a |
|
MD5 | 17853d0f7b414a061f1e769ce8c2a4a9 |
|
BLAKE2b-256 | 8d6162bb4d2c75502e90a827e7b4dc411a19523e64fb796bbcc377ac1d1aa0a4 |