library to handle spark job submit in a yarn cluster in different environment
Project description
A python library that can submit spark job to spark yarn cluster using rest API
Getting Started:
Use the library
# Import the SparkJobHandler
from spark_job_handler import SparkJobHandler
...
logger = logging.getLogger('TestLocalJobSubmit')
# Create a spark JOB
# jobName: name of the Spark Job
# jar: location of the Jar (local/hdfs)
# run_class: entry class of the appliaction
# hadoop_rm: hadoop resource manager host ip
# hadoop_web_hdfs: hadoop web hdfs ip
# hadoop_nn: hadoop name node ip (Normally same as of web_hdfs)
# env_type: env type is CDH or HDP
# local_jar: flag to define if a jar is local (Local jar gets uploaded to hdfs)
# spark_properties: custom properties that need to be set
sparkJob = SparkJobHandler(logger=logger, job_name="test_local_job_submit",
jar="./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar",
run_class="IrisApp", hadoop_rm='rma', hadoop_web_hdfs='nn', hadoop_nn='nn',
env_type="CDH", local_jar=True, spark_properties=None)
trackingUrl = sparkJob.run()
print "Job Tracking URL: %s" % trackingUrl
Build the simple-project
$ cd simple-project
$ sbt package;cd ..
The above steps will create the target jar as: ./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar
Update the nodes Ip in test:
load the data and make it available to HDFS:
$ wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
upload data to the HDFS:
$ python upload_to_hdfs.py <name_nodei_ip> iris.data /tmp/iris.data
Run the test cases:
Make the simple-project jar available in HDFS to test remote jar:
$ python upload_to_hdfs.py <name_nodei_ip> simple-project/target/scala-2.10/simple-project_2.10-1.0.jar /tmp/test_data/simple-project_2.10-1.0.jar
Run the test:
$ python test_spark_job_handler.py
Utility:
upload_to_hdfs.py: upload local file to hdfs file system
Notes:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spark_yarn_submit-1.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: spark_yarn_submit-1.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b757b2d7b3a47997dc803609eb40df3ae93026618c83febdf3e66ada3bd15fcd
|
|
| MD5 |
fa51e6c96cfa71c5657a3a521438cd8c
|
|
| BLAKE2b-256 |
d25df8b9747498ebbcf36c82e6d17f0c8e3964e2a5eb238588c077360e54c8f9
|