When working with Apache Spark, there are times when you need to trigger a Spark job on demand from outside the cluster. There are two ways in which we can submit Apache spark job in a cluster.
- Spark Submit from within the Spark cluster
To submit a Spark job from within the spark cluster, we use spark-submit. Below is a sample shell script that submits the Spark job. Most of the arguments are self-explanatory.
#!/bin/bash
$SPARK_HOME/bin/spark-submit \
--class com.nitendragautam.sparkbatchapp.main.Boot \
--master spark://192.168.133.128:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 4G \
--driver-memory 4G \
--total-executor-cores 2 \
/home/hduser/sparkbatchapp.jar \
/home/hduser/NDSBatchApp/input \
/home/hduser/NDSBatchApp/output/
- REST API from outside the Spark cluster
In this post, I will explain how to trigger a Spark job with the help of the REST API. Please make sure that Spark Cluster is running before submitting Spark Job.
Figure: Apache Spark Master
Trigger Spark batch job by using Shell Script
Create a Shell script named submit_spark_job.sh
with the below contents. Give the shells script
#!/bin/bash
curl -X POST http://192.168.133.128:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"appResource": "/home/hduser/sparkbatchapp.jar",
"sparkProperties": {
"spark.executor.memory": "4g",
"spark.master": "spark://192.168.133.128:7077",
"spark.driver.memory": "4g",
"spark.driver.cores": "2",
"spark.eventLog.enabled": "false",
"spark.app.name": "Spark REST API201804291717022",
"spark.submit.deployMode": "cluster",
"spark.jars": "/home/hduser/sparkbatchapp.jar",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "2.0.1",
"mainClass": "com.nitendragautam.sparkbatchapp.main.Boot",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"action": "CreateSubmissionRequest",
"appArgs": [
"/home/hduser/NDSBatchApp/input",
"/home/hduser/NDSBatchApp/output/"
]
}'
Once the spark Job successfully gets executed, we will see an output with the below contents.
nitendragautam@Nemo: sh submit_spark_job.sh
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20180429125849-0001",
"serverSparkVersion" : "2.0.1",
"submissionId" : "driver-20180429125849-0001",
"success" : true
}
Check Status of Spark Job using REST API
If you want to check the status of your Spark Job, you can use the Submission ID and below shell script.
curl http://192.168.133.128:6066/v1/submissions/status/driver-20180429125849-0001
{
"action" : "SubmissionStatusResponse",
"driverState" : "FINISHED",
"serverSparkVersion" : "2.0.1",
"submissionId" : "driver-20180429125849-0001",
"success" : true,
"workerHostPort" : "192.168.133.128:38451",
"workerId" : "worker-20180429124356-192.168.133.128-38451"
}