In the previous post, we learnt about setting up Spark job server, and running the spark jobs. So far, we have used Scala programs to run on job server. Now we’ll see, how to write the Spark jobs in java to run on job server.
As in Scala, job must implement the SparkJob trait. So the job looks like this:
- runJob method contains the implementation of the Job. The SparkContext is managed by the JobServer and will be provided to the job through this method. This relieves the developer from the boiler-plate configuration management that comes with the creation of a Spark job and allows the Job Server to manage and re-use contexts.
- validate method allows for an
initial validation of the context and any provided configuration. If the
context and configuration are OK to run the job, returning spark.jobserver.SparkJobValid will
let the job execute, otherwise returning spark.jobserver.SparkJobInvalid(reason) prevents the job
from running and provides means to convey the reason of failure. In this
case, the call immediately returns an HTTP/1.1 400 Bad
Request status code.
validate helps preventing running jobs that will eventually fail due to missing or wrong configuration and save both time and resources.
In Java, we need to extend JavaSparkJob class. It has following methods which will be overridden in the program:
- runJob(: , : )
- validate(: , : )
- invalidate(: , : )
JavaSparkJob class is available in job-server-api package. Build the job-server-api source code and add this jar to your project. Add spark and other required dependencies in your pom.xml.
Let’s start with the basic WordCount example:
Next step is : compile the code and build the jar. Then upload it to the Job server.
So your Spark job is ready to run on JobServer....!!!