DevOps | Cloud | Analytics | Open Source | Programming

Sample Code - Spark Structured Streaming vs Spark Streaming

This post gives Sample Code - Spark Structured Streaming vs Spark Streaming . Major differences between Spark Structured Streaming vs Spark Streaming are -

  • Structured Streaming works on Dataframe\Datasets whereas Spark Streaming works on RDDs
  • Structured Streaming doesn't work on Micro-batch format(like Spark Streaming does). Rather each data stream row is processed and updated into the unbounded result table. So Structured Streaming is more Real-Time from that aspect.
Below is a sample piece of code which demonstrates How data is read and processed in both Structured Streaming as well as Spark Streaming. It basically shows how you create a Spark-Structured-Streaming environment as well how you create a Spark Streaming environment. This is not a complete end-to-end Application code . It just gives you an easy understanding.  

Sample Code - Structured Streaming :

val lines \= spark.readStream

val data \=\_split(" "))

val countOfWords \= data.groupby("value").count()



SampleCode - Spark Streaming :

val streamContext \= new StreamingContext(conf,Seconds(1))
val data \= ssc.socketTextStream("localhost", 9999)

val wordCounts \=\_.value) // split the message into lines
	              .flatMap(\_.split(" ")) //split into words
	              .filter(w \=> w.length() \> 0) // remove empty words
	              .map(w \=> (w, 1L)).reduceByKey(\_ + \_) // count by word


streamContext.start()   Additional Read - Sample Code for PySpark Cassandra Application Sample Code for Spark Cassandra Scala Application