DevOps | Cloud | Analytics | Open Source | Programming





Sample Code - Spark Structured Streaming vs Spark Streaming



This post gives Sample Code - Spark Structured Streaming vs Spark Streaming . Major differences between Spark Structured Streaming vs Spark Streaming are -

  • Structured Streaming works on Dataframe\Datasets whereas Spark Streaming works on RDDs
  • Structured Streaming doesn't work on Micro-batch format(like Spark Streaming does). Rather each data stream row is processed and updated into the unbounded result table. So Structured Streaming is more Real-Time from that aspect.
Below is a sample piece of code which demonstrates How data is read and processed in both Structured Streaming as well as Spark Streaming. It basically shows how you create a Spark-Structured-Streaming environment as well how you create a Spark Streaming environment. This is not a complete end-to-end Application code . It just gives you an easy understanding.  

Sample Code - Structured Streaming :


val lines \= spark.readStream
			.format("socket")
			.option("host","localhost")
			.option("port",9999)
			.load()

val data \= lines.as(String).flatMap(\_split(" "))

val countOfWords \= data.groupby("value").count()

counOfWords.writeStream
    .format("console")
    .option("truncate","false")
    .start()
    .awaitTermination()


 

SampleCode - Spark Streaming :


val streamContext \= new StreamingContext(conf,Seconds(1))
val data \= ssc.socketTextStream("localhost", 9999)

val wordCounts \= data.map(\_.value) // split the message into lines
	              .flatMap(\_.split(" ")) //split into words
	              .filter(w \=> w.length() \> 0) // remove empty words
	              .map(w \=> (w, 1L)).reduceByKey(\_ + \_) // count by word

wordCounts.print()

streamContext.start()


  https://spark.apache.org/docs/latest/streaming-programming-guide.html   Additional Read - Sample Code for PySpark Cassandra Application Sample Code for Spark Cassandra Scala Application