How To Setup Spark Scala SBT in Eclipse

This post explains How To Setup Spark Scala SBT in Eclipse . Normally running any Spark application especially in Scala is a bit lengthy since you need to code, compile , build jar and finally deploy or execute it.

Alternatively we can also use Eclipse . Eclipse is a FREE IDE - immensely used for writing Java or Scala code. So if we can Integrate the Spark environment in Eclipse , it is of Big Help - we can quickly run , debug and unit test the code changes without the pain of a Lengthy process . I will explain a Step by Step process to do so -

Part 1 - Set up The Environment :

1. Create and Verify The Folders

Create the below folders in C drive. You can also use any other drive . But for this post , I am considering the C Drive for the set-up. 1.1. For Spark - C:\Spark 1.2. For Hadoop - C:\Hadoop\bin 1.3. For Java - Check where your Java JDK is installed. If Java is not already installed , install it from Oracle website (https://java.com/en/download/help/windows_manual_download.xml) . Ideally Java version 8 works fine without any issues so far. So try that. Lets assume Java is installed . Note down the Java JDK path. Typically it is like - C:\Program Files\Java\jdk1.8.0_191. It might be different based on what folder you choose. But whatsoever , Note the path down. We will need all the above 3 Folder names in our next steps.

2. Downloads

Download the following - > Download Spark from - https://spark.apache.org/downloads.html Extract the files and place it in - C:\Spark. e.g If you have downloaded spark 2.2.1 version and extracted , it will look something like - C:\Spark\spark-2.2.1-bin-hadoop2.7

> Download winutils.exe from - https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe

Copy the winutils.exe file in C:\Hadoop\bin

3. Environment Variable Set-up

Let's set up the environment variable now. Open the Environment variables windows . And Create New or Edit if already available. Based on what I have chosen , I will need to add the following variables as Environment variables - SPARK_HOME - C:\Spark\spark-2.2.1-bin-hadoop2.7 HADOOP_HOME - C:\Hadoop JAVA_HOME - C:\Program Files\Java\jdk1.8.0_191 These values are as per my folder structure. Please try to keep the same folder structure. For my case , it looks like below once I set-up the environment variables -

Also Add Java & Spark bin dir location in your Windows Path Variable. For my case the dir locations are below. I have added them to my Windows PATH variable. C:\Program Files\Java\jdk1.8.0_191 C:\Spark\spark-2.2.1-bin-hadoop2.7

4. Eclipse

I believe Eclipse might already have been set up in your system. If not you can install Eclipse from - https://www.eclipse.org/downloads/packages/ Setting up Eclipse is Quite straightforward. So I am gonna safely skip this part.

5. Maven

Add Maven plugin to your Eclipse using "Help" --> Install New Software option. Use below link - http://download.eclipse.org/technology/m2e/releases/

6. Scala

This is important - Version of Scala Software to be installed needs to EXACTLY SAME as the Scala version mentioned in the Spark. To verify , in windows command line type spark-shell and you will notice the Compatible version of Scala mentioned. Snapshot of the message from my Terminal - Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191) Type in expressions to have them evaluated. Type :help for more information. So I know my Compatible Scala version is 2.11.8. Find your version - https://www.scala-lang.org/download/all.html Download & install. Verify the version - scala -version

7. SBT - Scala Build Tool

Download and install from - https://www.scala-sbt.org/download.html

8. Eclipse Plugin for Scala

Download and install Scala plugin for eclipse . Download from - https://marketplace.eclipse.org/content/scalastyle

Part 2 - Set Up For Project Specific :

1. Folder


// 1. Make a Project Folder 
mkdir SimpleApp

#Create folder structure within "SimpleApp" dir
mkdir lib, project, target, src
mkdir src/main, src/test 
mkdir src/main/java ,src/main/resources ,src/main/scala 
mkdir src/test/java, src/test/resources, src/test/scala

2. Build.sbt

Ensure the scala version & Spark version used in below matches exactly what you see while running "spark-shell" command.


// 2. Create build.sbt file in /SimpleApp dir with below content. 
// Use correct Scala & spark version
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.1"

3. SBT Project Building


// ALL "sbt" COMMANDS NEED TO BE RUN FROM /SimpleApp dir level
// 3 Go to SimpleApp dir & Run below -
> sbt

4. Eclipse Plugin

Create a plugins.sbt file in /SimpleApp/project/ Copy below contents Also refer - https://github.com/sbt/sbteclipse addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "5.2.4")

// Reload for the necessary files to be download for Eclipse > sbt > reload > eclipse sbt

5. Eclipse

Open Eclipse
Open Scala Perspective
Import the Project i.e. /SimpleApp dir
Create a New Scala Object in src/main/scala in Eclipse
Copy the below code


import org.apache.spark.sql.SparkSession

object SimpleApp {
  def main(args: Array\[String\]) {
    val logFile \= "D:\\\\WorkDirectory\\\\README.md" // Should be some file on your system 
    val spark \= SparkSession.builder.appName("Simple Application").master("local").getOrCreate()
    val logData \= spark.read.textFile(logFile).cache()
    val numAs \= logData.filter(line \=> line.contains("spark")).count()
    val numBs \= logData.filter(line \=> line.contains("pyspark")).count()
    //println(s"Lines with word=spark: $numAs, Lines with with word=pyspark: $numBs")
    println(s"Lines with word=spark : $numAs")
    println(s"Lines with word=pyspark : $numBs")
    println("===============")
    spark.stop()
  }
}

Go To Eclipse-->Project-->Properties--> Scala Compiler --> Scala Installation (Select Scala version as in Build.sbt)
Run
You can see the Spark Output in the Eclipse Console. It will look something like below -

This marks end as Objective of this Post. Do read other posts from this Blog. Additional Read -

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How To Setup Spark Scala SBT in Eclipse

Part 1 - Set up The Environment :

1. Create and Verify The Folders

2. Downloads

3. Environment Variable Set-up

4. Eclipse

5. Maven

6. Scala

7. SBT - Scala Build Tool

8. Eclipse Plugin for Scala

Part 2 - Set Up For Project Specific :

1. Folder

2. Build.sbt

3. SBT Project Building

4. Eclipse Plugin

5. Eclipse

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How To Setup Spark Scala SBT in Eclipse

Part 1 - Set up The Environment :

1. Create and Verify The Folders

2. Downloads

3. Environment Variable Set-up

4. Eclipse

5. Maven

6. Scala

7. SBT - Scala Build Tool

8. Eclipse Plugin for Scala

Part 2 - Set Up For Project Specific :

1. Folder

2. Build.sbt

3. SBT Project Building

4. Eclipse Plugin

5. Eclipse

Popular Articles

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)