Skip to main content

library to handle spark job submit in a yarn cluster in different environment

Project description

A python library that can submit spark job to spark yarn cluster using rest API

Note: It Currently supports the CDH(5.6.1) and HDP(,
The Library is Inspired from:

Getting Started:

Use the library

# Import the SparkJobHandler
from spark_job_handler import SparkJobHandler


logger = logging.getLogger('TestLocalJobSubmit')
# Create a spark JOB
# jobName:           name of the Spark Job
# jar:               location of the Jar (local/hdfs)
# run_class:         entry class of the appliaction
# hadoop_rm:         hadoop resource manager host ip
# hadoop_web_hdfs:   hadoop web hdfs ip
# hadoop_nn:         hadoop name node ip (Normally same as of web_hdfs)
# env_type:          env type is CDH or HDP
# local_jar:         flag to define if a jar is local (Local jar gets uploaded to hdfs)
# spark_properties:  custom properties that need to be set
sparkJob = SparkJobHandler(logger=logger, job_name="test_local_job_submit",
            run_class="IrisApp", hadoop_rm='rma', hadoop_web_hdfs='nn', hadoop_nn='nn',
            env_type="CDH", local_jar=True, spark_properties=None)
trackingUrl =
print "Job Tracking URL: %s" % trackingUrl
The above code starts an spark application using the local jar (simple-project/target/scala-2.10/simple-project_2.10-1.0.jar)
For more example see the

Build the simple-project

$ cd simple-project
$ sbt package;cd ..

The above steps will create the target jar as: ./simple-project/target/scala-2.10/simple-project_2.10-1.0.jar

Update the nodes Ip in test:

Add the node IP for hadoop resource manager and Name node in the test_cases:
* rm: Resource Manager * nn: Name Node

load the data and make it available to HDFS:

$ wget

upload data to the HDFS:

$ python <name_nodei_ip> /tmp/

Run the test cases:

Make the simple-project jar available in HDFS to test remote jar:

$ python <name_nodei_ip> simple-project/target/scala-2.10/simple-project_2.10-1.0.jar /tmp/test_data/simple-project_2.10-1.0.jar

Run the test:

$ python


  • upload local file to hdfs file system


The Library is still in early stage and need testing, bug-fixing and documentation
Before running, follow the below steps:
* Update the ResourceManager,NameNode and WebHDFS Port if required in
* Make the spark-jar available in hdfs as: hdfs:/user/spark/share/lib/spark-assembly.jar
For Contribution Please Create Issue corresponding PR

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for spark-yarn-submit, version 1.0.0
Filename, size File type Python version Upload date Hashes
Filename, size spark_yarn_submit-1.0.0-py2.py3-none-any.whl (12.3 kB) File type Wheel Python version py2.py3 Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page