How to Build & Run Spark Cassandra Application

This post explains - How to Build & Run Spark Cassandra Application. You can get the Spark Cassandra Sample Code from my this post. Please follow the step by step process to create and execute your Spark Cassandra application.

Step 1 - Create Cassandra Table:

Create a Cassandra Table -



CREATE KEYSPACE Key\_Space WITH replication = {'class': 'SimpleStrategy', 'replication\_factor' : 1};

From here, we create a table for the data.



CREATE TABLE Key\_Space.Word\_Count\_table (
Word TEXT,
Word\_Count INT,
Timestamp TIMESTAMP
PRIMARY KEY(Word)
);

Step 2 - Prepare Directory Structure :

First we need to structure the directory of the project –



mkdir test-project // Can be any name

cd test-project  // if you named it got-battles

mkdir src
mkdir src/main
mkdir src/main/scala
mkdir src/main/scala/com
mkdir src/main/scala/com/sparkcassprojet
mkdir project

Step 3 - Prepare Code Dependency :

Create build.sbt file and put in under ~/test-project dir. The build.sbt file contains the Dependencies and versions of libraries to be used within the Sparl application.



name := "Spark-Cassandra-App"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.0.0",
"com.datastax.spark" %% "spark-cassandra-connector" % "2.0.0-RC1",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0",
("org.apache.spark" %% "spark-streaming-kafka" % "1.6.0").
exclude("org.spark-project.spark", "unused")
)

For version compatibility , please refer - https://github.com/datastax/spark-cassandra-connector

Step 4 - Sample Code - Spark Cassandra:

You can get the Sample code from my post. Put this file in - ~/test-project/src/main/scala/com/sparkcassproject called AppSparkCassandra.scala

Step 5 - Compile the Code :

Next we need to create the jar file of our application.

Go to the dir ~/test-project.
Next run the command **sbt assembly** from command-line to produce a Spark deployable jar.
You will find the Jar fille in ~/target dir

Step 6 - Run the Spark Application :

You can run the Spark application with below command (You might need to change names or parameters based on what you have chosen)

Use spark-submit command to submit the jar to run your Spark application.



$ SPARK\_DIRECTORY/bin/spark-submit 
--class "com.gankrin.AppSparkCassandra"
--master GIVE\_THE\_SPARK\_HOST\_NAME\_OR\_IP:PORT\_NO 
/target/scala-2.11.8/Spark-Cassandra-App-assembly-1.0.jar

Hope this helps .

Additional Useful Reads-

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How to Build & Run Spark Cassandra Application

Step 1 - Create Cassandra Table:

Step 2 - Prepare Directory Structure :

Step 3 - Prepare Code Dependency :

Step 4 - Sample Code - Spark Cassandra:

Step 5 - Compile the Code :

Step 6 - Run the Spark Application :

Sample Code for Spark Cassandra Scala Application

Different Parts of a Spark Application Code , Class & Jars

Best Practices for Dependency Problem in Spark

How to Improve Spark Application Performance –Part 1?

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How to Build & Run Spark Cassandra Application

Step 1 - Create Cassandra Table:

Step 2 - Prepare Directory Structure :

Step 3 - Prepare Code Dependency :

Step 4 - Sample Code - Spark Cassandra:

Step 5 - Compile the Code :

Step 6 - Run the Spark Application :

Sample Code for Spark Cassandra Scala Application

Different Parts of a Spark Application Code , Class & Jars

Best Practices for Dependency Problem in Spark

How to Improve Spark Application Performance –Part 1?

Popular Articles

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)