Generic ETL Pipeline Framework for Apache Spark
Project description
Deployers
HDFSDeployer
Deploy application build to HDFS via a bridge host
To create a deployer, here is the sample code:
bridge
is a ssh hostname where you can run thehdfs dfs ...
commandstage_dir
is a temporary directory inbridge
machine, for storing temporary files.
deployer = HDFSDeployer({
"bridge" : "spnode1",
"stage_dir": "/root/.stage_dir",
})
To deploy an application, here is the sample code:
- Frist parameter tells where is the application
build
. You need to build to this directory first - Second parameter tell where is the destination to deploy the application.
deployer.deploy(
"/mnt/DATA_DISK/projects/spark_etl/examples/myapp/build",
"/apps/myjob"
)
Job Submitters
LivyJobSubmitters
To create a job submitter, here is the sample code:
service_url
points to the livy endpointusername
,password
is your livy username and passwordbridge
: is a ssh hostname, where you can runyarn logs -applicationId
to get the application log
Here is an example:
job_submitter = LivyJobSubmitter({
"service_url": "http://10.0.0.11:60008/",
"username": "root",
"password": "foo",
"bridge": "spnode1"
})
To run the application, here is the sample:
- first parameter is the deployment location. The deployer is responsible for the deployment.
/apps/myjob/build/1.0.0.1
resides in HDFS
job_submitter.run(
"/apps/myjob/build/1.0.0.1"
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spark-etl-0.0.4.tar.gz
(9.3 kB
view hashes)
Built Distribution
Close
Hashes for spark_etl-0.0.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9933a3fef78489d77322ee2621324b35f48e8d14d228ea9a3470d94cfd4f4237 |
|
MD5 | 13baead30f24c9662cb4d695f3696525 |
|
BLAKE2b-256 | 69a20dac13f54f785a4ef22ddcaf3d567201476b2f9c290fac230d9c9d2459db |