DevOps | Cloud | Analytics | Open Source | Programming





Difference Between SparkSession , SparkContext , SQLContext & HiveContext



This post explains the difference between - SparkSession SparkContext SQLContext HiveContext. The difference between Spark Session vs Spark Context vs Sql Context lies in the version of the Spark versions used in Application. As per Spark versions > Spark 2.0 , A pictorial Representation of the Hierarchy between - SparkSession SparkContext SQLContext HiveContext

Spark Session vs Spark Context vs SQLContext vs HiveContext.

 

SparkContext :

  • Before Spark 2.x , SparkContext was the entry point of any Spark Application
  • It is the Main channel to access all Spark Functionality

SparkSession :

  • After Spark 2.x onwards , SparkSession serves as the entry point for all Spark Functionality
  • All Functionality available with SparkContext are also available with SparkSession.
  • However if someone prefers to use SparkContext , they can continue to do so
 

HiveContext:

  • HiveContext is a Superset of SQLContext.
  • It can do what SQLContext can PLUS many other things
  • Additional features include the ability to write queries using the more complete HiveQL parser, access to Hive UDFs, and the ability to read data from Hive tables.

SQLContext:

  • Spark SQLContext allows us to connect to different Data Sources to write or read data from them
  • However it has limitations - when the Spark program ends or the Spark shell is closed, all links to the datasoruces are gone and will not be available in the next session.

Code Sample - SparkSession & SparkContext  :


################
### Spark Conf
################

spark\_conf \= (SparkConf()
           .setAppName("Your App Name"))
           .set("spark.some.config.option", "some-value")

sc \= SparkContext(conf \= spark\_conf)


###################
## Spark Session
###################
spark \= (SparkSession
          .builder
          .appName("Your App Name")
          .config("spark.some.config.option", "some-value")
          .getOrCreate())  

sample examples of the Tuple \- ("spark.some.config.option", "some-value") 
.
Examples of the Tuple are \- 
('spark.executor.memory', '4g')
('spark.executor.cores', '4')
('spark.cores.max', '4')
('spark.driver.memory','4g')


 

View All Config Parameters:

Run the code in command line


sc.\_conf.getAll() 

spark.sparkContext.getConf().getAll()

  Additional Read - Different Parts of a Spark Application Code , Class & Jars How to Improve Spark Application Performance –Part 1?