DevOps | Cloud | Analytics | Open Source | Programming





How to Build & Run Spark Cassandra Application



This post explains - How to Build & Run Spark Cassandra Application. You can get the Spark Cassandra Sample Code from my this post. Please follow the step by step process to create and execute your Spark Cassandra application.

Step 1 - Create Cassandra Table:

Create a Cassandra Table -



CREATE KEYSPACE Key\_Space WITH replication = {'class': 'SimpleStrategy', 'replication\_factor' : 1};


From here, we create a table for the data.



CREATE TABLE Key\_Space.Word\_Count\_table (
Word TEXT,
Word\_Count INT,
Timestamp TIMESTAMP
PRIMARY KEY(Word)
);



Step 2 - Prepare Directory Structure :

First we need to structure the directory of the project –



mkdir test-project // Can be any name

cd test-project  // if you named it got-battles

mkdir src
mkdir src/main
mkdir src/main/scala
mkdir src/main/scala/com
mkdir src/main/scala/com/sparkcassprojet
mkdir project


Step 3 - Prepare Code Dependency :

Create build.sbt file and put in under ~/test-project dir. The build.sbt file contains the Dependencies and versions of libraries to be used within the Sparl application.



name := "Spark-Cassandra-App"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.0.0",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.0.0-RC1",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0",
("org.apache.spark" %% "spark-streaming-kafka" % "1.6.0").
exclude("org.spark-project.spark", "unused")
)



  For version compatibility , please refer - https://github.com/datastax/spark-cassandra-connector  

Step 4 - Sample Code - Spark Cassandra:

You can get the Sample code from my post. Put this file in - ~/test-project/src/main/scala/com/sparkcassproject called AppSparkCassandra.scala  

Step 5 - Compile the Code :

Next we need to create the jar file of our application.

  • Go to the dir  ~/test-project.
  • Next run the command **sbt assembly** from command-line to produce a Spark deployable jar.
  • You will find the Jar fille in ~/target dir
 

Step 6 - Run the Spark Application :

You can run the Spark application with below command (You might need to change names or parameters based on what you have chosen)

  • Use spark-submit command to submit the jar to run your Spark application.


$ SPARK\_DIRECTORY/bin/spark-submit 
--class "com.gankrin.AppSparkCassandra"
--master GIVE\_THE\_SPARK\_HOST\_NAME\_OR\_IP:PORT\_NO 
/target/scala-2.11.8/Spark-Cassandra-App-assembly-1.0.jar


Hope this helps .  

Additional Useful Reads-

Sample Code for Spark Cassandra Scala Application

Different Parts of a Spark Application Code , Class & Jars

Best Practices for Dependency Problem in Spark

How to Improve Spark Application Performance –Part 1?