DevOps | Cloud | Analytics | Open Source | Programming

How To Setup Spark Scala SBT in Eclipse

This post explains How To Setup Spark Scala SBT in Eclipse . Normally running any Spark application especially in Scala is a bit lengthy since you need to code, compile , build jar and finally deploy or execute it. 

Alternatively we can also use Eclipse . Eclipse is a FREE IDE - immensely used for writing Java or Scala code. So if we can Integrate the Spark environment in Eclipse , it is of Big Help - we can quickly run , debug and unit test the code changes without the pain of a Lengthy process . I will explain a Step by Step process to do so -  

Part 1 - Set up The Environment :


1. Create and Verify The Folders

Create the below folders in C drive. You can also use any other drive . But for this post , I am considering the C Drive for the set-up. 1.1. For Spark - C:\Spark 1.2. For Hadoop - C:\Hadoop\bin 1.3. For Java - Check where your Java JDK is installed. If Java is not already installed ,  install it from Oracle website ( . Ideally Java version 8 works fine without any issues so far. So try that. Lets assume Java is installed . Note down the Java JDK path. Typically it is like - C:\Program Files\Java\jdk1.8.0_191. It might be different based on what folder you choose. But whatsoever , Note the path down. We will need all the above 3 Folder names in our next steps.  


Download the following - > Download Spark from - Extract the files and place it in - C:\Spark. e.g If you have downloaded spark 2.2.1 version and extracted , it will look something like - C:\Spark\spark-2.2.1-bin-hadoop2.7  

>   Download winutils.exe from -

Copy the winutils.exe file in C:\Hadoop\bin


3. Environment Variable Set-up

Let's set up the environment variable now. Open the Environment variables windows . And Create New or Edit if already available. Based on what I have chosen , I will need to add the following variables as Environment variables - SPARK_HOME - C:\Spark\spark-2.2.1-bin-hadoop2.7 HADOOP_HOME - C:\Hadoop JAVA_HOME - C:\Program Files\Java\jdk1.8.0_191 These values are as per my folder structure. Please try to keep the same folder structure. For my case , it looks like below once I set-up the environment variables - variables Also Add Java & Spark bin dir location in your Windows Path Variable. For my case the dir locations are below. I have added them to my Windows PATH variable. C:\Program Files\Java\jdk1.8.0_191 C:\Spark\spark-2.2.1-bin-hadoop2.7  

4. Eclipse

I believe Eclipse might already have been set up in your system. If not you can install Eclipse from - Setting up Eclipse is Quite straightforward. So I am gonna safely skip this part.  

5. Maven

Add Maven plugin to your Eclipse using "Help" --> Install New Software option. Use below link -  

6. Scala

This is important - Version of Scala Software to be installed needs to EXACTLY SAME as the Scala version mentioned in the Spark. To verify , in windows command line type spark-shell and you will notice the Compatible version of Scala mentioned.  Snapshot of the message from my Terminal - Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191) Type in expressions to have them evaluated. Type :help for more information. So I know my Compatible Scala version is 2.11.8. Find your version - Download & install. Verify the version -   scala -version  

7. SBT - Scala Build Tool

Download and install from -  

8. Eclipse Plugin for Scala

Download and install Scala plugin for eclipse . Download from -  

Part 2 - Set Up For Project Specific :


1. Folder

// 1. Make a Project Folder 
mkdir SimpleApp

#Create folder structure within "SimpleApp" dir
mkdir lib, project, target, src
mkdir src/main, src/test 
mkdir src/main/java ,src/main/resources ,src/main/scala 
mkdir src/test/java, src/test/resources, src/test/scala


2. Build.sbt

Ensure the scala version & Spark version used in below matches exactly what you see while running "spark-shell" command.

// 2. Create build.sbt file in /SimpleApp dir with below content. 
// Use correct Scala & spark version
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"


3. SBT Project Building

// ALL "sbt" COMMANDS NEED TO BE RUN FROM /SimpleApp dir level
// 3 Go to SimpleApp dir & Run below -
> sbt


4. Eclipse Plugin


  • Create a plugins.sbt file in /SimpleApp/project/ Copy below contents Also refer - addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.4")
// Reload for the necessary files to be download for Eclipse > sbt > reload > eclipse sbt  

5. Eclipse

  • Open Eclipse
  • Open Scala Perspective
  • Import the Project i.e. /SimpleApp dir
  • Create a New Scala Object in src/main/scala in Eclipse
  • Copy the below code

import org.apache.spark.sql.SparkSession

object SimpleApp {
  def main(args: Array\[String\]) {
    val logFile \= "D:\\\\WorkDirectory\\\\" // Should be some file on your system 
    val spark \= SparkSession.builder.appName("Simple Application").master("local").getOrCreate()
    val logData \=
    val numAs \= logData.filter(line \=> line.contains("spark")).count()
    val numBs \= logData.filter(line \=> line.contains("pyspark")).count()
    //println(s"Lines with word=spark: $numAs, Lines with with word=pyspark: $numBs")
    println(s"Lines with word=spark : $numAs")
    println(s"Lines with word=pyspark : $numBs")


  • Go To Eclipse-->Project-->Properties--> Scala Compiler --> Scala Installation (Select Scala version as in Build.sbt)
  • Run
  • You can see the Spark Output in the Eclipse Console. It will look something like below -
  eclipse 1   This marks end as Objective of this Post. Do read other posts from this Blog.   Additional Read -