Wednesday, 15 April 2015

How to run Apache Spark on Windows7 in standalone mode

So far, we might have done setup of Spark with Hadoop, EC2 or mesos on Linux machine.  But what if we don’t want with Hadoop/EC2, we just want to run it in standalone mode on windows.
Here we’ll see how we can run Spark on Windows machine.

Prerequisites:
  • Java6+
  • Scala 2.10.x
  • Python 2.6 +
  • Spark 1.2.x
  • sbt ( In case of building Spark Source code)
  • GIT( If you use sbt tool)

Now we’ll see the installation steps :
  • Install Java 7 or later. Set JAVA_HOME and PATH variable as environment variables.
  • Download Scala 2.10 or Scala 2.11 and install. Set SCALA_HOME  and add %SCALA_HOME%\bin in   PATH variable in environment variables. To test whether Scala is installed or not, run following command
  • Next thing is Spark. Spark can be installed in two ways.
    •  Building Spark using SBT
    •  Use Prebuilt Spark package

 Building Spark with SBT :
  • Download SBT and install. Set SBT_HOME and PATH variable in environment variables.
  • Download source code from Spark website against any of the Hadoop version.
  • Run sbt assembly command to build the Spark package
  • You need to set Hadoop version also while building as follows : 
  •      sbt –Pyarn –pHadoop 2.3 assembly                                                                     

Using Spark Prebuilt Package:
  • Choose a Spark prebuilt package for Hadoop i.e.  Prebuilt for Hadoop 2.3/2.4 or later. Download and extract it to any drive i.e. D:\spark-1.2.1-bin-hadoop2.3
  • Set SPARK_HOME and add %SPARK_HOME%\bin in PATH in environment variables
  • Run following command on command line.                              
  • You’ll get and error for winutils.exe:
      Though we aren’t using Hadoop with Spark, but somewhere it checks for HADOOP_HOME variable in configuration. So to overcome this error, download winutils.exe and place it in any location (i.e. D:\winutils\bin\winutils.exe).

P.S. As per the Operating system version, this winutils.exe may vary. So in case, if it doesn't support to your OS, please find another one and use. You can refer this Problems running Hadoop on Windows link for winutils.exe.

  • Set HADOOP_HOME = D:\winutils in environment variable
  • Now, Re run the command “spark-shell’ , you’ll see the scala shell. For latest spark releases, if you get the permission error for /tmp/hive directory as given below:
  • The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw- 
     You need to run following command :
D:\spark>D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
  • For Spark UI : open http://localhost:4040/ in browser
  • For testing the successful setup you can run the example :   
  • It will execute the program and return the result :
Now your cluster is successfully launched, start writing your Java/Python/Scala programs…!!!! J