Generic ETL Pipeline Framework for Apache Spark
Project description
Deployers
HDFSDeployer
Deploy application build to HDFS via a bridge host
To create a deployer, here is the sample code:
bridge
is a ssh hostname where you can run thehdfs dfs ...
commandstage_dir
is a temporary directory inbridge
machine, for storing temporary files.
deployer = HDFSDeployer({
"bridge" : "spnode1",
"stage_dir": "/root/.stage_dir",
})
To deploy an application, here is the sample code:
- Frist parameter tells where is the application
build
. You need to build to this directory first - Second parameter tell where is the destination to deploy the application.
deployer.deploy(
"/mnt/DATA_DISK/projects/spark_etl/examples/myapp/build",
"/apps/myjob"
)
Job Submitters
LivyJobSubmitters
To create a job submitter, here is the sample code:
service_url
points to the livy endpointusername
,password
is your livy username and passwordbridge
: is a ssh hostname, where you can runyarn logs -applicationId
to get the application log
Here is an example:
job_submitter = LivyJobSubmitter({
"service_url": "http://10.0.0.11:60008/",
"username": "root",
"password": "foo",
"bridge": "spnode1"
})
To run the application, here is the sample:
- first parameter is the deployment location. The deployer is responsible for the deployment.
/apps/myjob/build/1.0.0.1
resides in HDFS
job_submitter.run(
"/apps/myjob/build/1.0.0.1"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spark_etl-0.0.1.tar.gz
(2.7 kB
view hashes)
Built Distribution
Close
Hashes for spark_etl-0.0.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e9fded217de26bfa98991df147c64e01b0bc71bcc9568ee3b6b3fac0ad90239 |
|
MD5 | 871b01e3b1706a73522e26363cedd0cb |
|
BLAKE2b-256 | af8a713f0146d909c6e05cf03a1f3c7a1dc3bc3a67d0cf3ca040cd43c7201f8e |