DevOps | Cloud | Analytics | Open Source | Programming





Difference between Hadoop, HDFS , Map Reduce, Spark , Spark Streaming & SparkSql



This post explains the difference between the Terminologies ,Technologies & Difference between them – Hadoop, HDFS,  Map Reduce, Spark, Spark Sql & Spark Streaming For a newbie who has started to learn Big Data , the Terminologies sound quite confusing . So lets try to explore each of them and see where they all fit in.  

1. Hadoop

Hadoop is a Collection of  open-source softwares or technologies. It is a Type of Big Data Ecosystem. Hadoop Project was started to facilitate the need of processing the Growing volume of different types of data on a distributed platform.  Traditional ways of data processing were getting inefficient .  Since Hadoop is an Ecosystem, it comprises of different Components & technologies . HDFS , MAP Reduce are two such components in a Hadoop Ecosystem. One main thing to note , The hadoop ecosystem runs over a Cluster of computers.  

HDFS

HDFS is a File System and it is part of the Hadoop ecosystem. HDFS stands for Hadoop Distributed File System. When we say a HDFS file, it means a file stored in HDFS file system. And you can store any types of file e.g csv,, excel, word, text, xml, json , pdf in a HDFS file system. Another point when you store a file in HDFS , HDFS file system internally splits the file into sub-units(called Blocks) and stores those sub-units in the cluster of computers.  

Map Reduce

Map Reduce – It is the Data processing component in a Hadoop ecosystem. Basically Map Reduce is a Framework written in Java. So when you want to process some data (i.e. Files in HDFS system), you need to write a Map Reduce Program in Java. Then compile the Java program into a jar and use the jar to process the data. The outputs of the Map Reduce programs are again written in HDFS file system.

2. Spark

Spark – Spark is also a Parallel Data processing Framework. It is not part of the Hadoop . But it is a different part of the Big Data ecosystem. Spark is an Alternative of Map Reduce (not of Hadoop).  Spark uses RAM to process data which makes it Faster than Map Reduce. Hence now a days, most of the data processing uses Spark – not Map Reduce.

Spark Sql:

SparkSql – It is a sub-module of Spark Framework. Basically it enables you to query Structured Data (e.g. csv, excel, tab delimited data ) using SQL or Dataframe API. So  basically you can write SQL or SQL-type queries to process data (similar to querying a RDBMS table).

Spark Streaming:

Spark Streaming – Spark Streaming is another sub-module of Spark Family. Ttraditional Spark is used to process Batch data i.e.  static files. But imagine if you have a Streaming or flowing data like Twitter tweets or Facebook posts or Sensor data , then you need to use Spark Streaming module. Hope this post clarifies the difference between all these technologies. I will cover them in depth in my subsequent posts.  

Other Interesting Reads -


differences spark hadoop ,difference spark hadoop ,comparison hadoop spark ,difference between spark hadoop ,difference spark and hadoop ,compare spark and hadoop ,differences between spark and hadoop ,difference between hadoop spark and hive ,difference between hadoop spark and storm ,differences between hadoop and spark ,difference entre spark et hadoop ,difference between spark vs hadoop ,difference between spark and hadoop mapreduce ,difference between spark and hadoop ,difference between spark cluster and hadoop cluster ,difference between hadoop and spark and kafka ,difference between hadoop and spark and scala ,difference between hadoop and spark architecture ,difference between hadoop and spark quora ,difference between hadoop and spark cluster ,difference between hadoop developer and spark developer ,difference between hadoop and spark is ,difference in hadoop and spark ,spark vs hadoop difference ,comparison between hadoop and spark ,compare hadoop and spark ,hadoop vs spark comparison ,diff between spark and hadoop ,what's the difference between spark and hadoop ,diff between hadoop and spark ,what is difference between spark and hadoop ,difference between hadoop and spark ,difference between apache spark and apache hadoop ,difference between hadoop and spark and hive ,difference between apache spark and hadoop ,difference in spark and hadoop ,what is difference between apache spark and hadoop ,comparison between spark and hadoop ,difference between spark sql and dataframe ,difference between spark sql and hive ,difference between spark sql and impala ,difference between spark sql and pyspark ,difference between spark sql and sql ,difference between spark.sql and sqlcontext.sql ,difference between spark sql and scala ,difference between spark sql and spark streaming ,difference between spark dataframe and sql table ,difference between dataframe api and spark sql ,difference between dataframe and spark sql ,difference between spark dataframe and pandas dataframe ,difference between spark and pandas dataframe ,difference between spark dataframe and dataset ,what is the difference between dataframe and spark sql ,difference between rdd dataset and dataframe in spark ,difference between spark dataframe and spark sql ,difference between spark sql and hive sql ,difference between spark and hive ,difference between spark sql and hiveql ,explain the difference between spark sql and hive ,what is difference between hive and spark sql ,difference between hive on spark and spark sql ,difference between hive ql and spark sql ,differences hdfs vs ,difference between hdfs and dfs ,difference between hdfs and dbfs ,difference between hdfs and data lake ,difference between hdfs and emrfs ,differences between hdfs and object storage ,difference between hdfs and other file system ,difference between hadoop vs spark ,difference between hdfs and windows ,difference between hdfs and webhdfs ,difference between hdfs and yarn ,difference between hdfs dfs and fs ,difference between dfs and hdfs ,dfs vs hdfs ,dfs and hdfs ,difference between hadoop fs and hdfs dfs commands ,difference between hadoop dfs and hdfs dfs ,difference between hadoop dfs and hadoop fs ,difference between hdfs dfs and hadoop fs ,difference between hadoop and hdfs dfs ,what is difference between hdfs dfs and hadoop fs ,difference between hdfs and normal file system ,difference between hdfs and local file system ,difference between hadoop and data lake ,difference between hdfs and gfs ,difference between hdfs and hive ,difference between hdfs and hadoop ,difference between hdfs and spark ,difference between hdfs and nfs ,key differences between hdfs and object storage ,difference between hdfs and traditional file system ,difference between hdfs and unix file system ,difference between local file system and hdfs ,difference between local file system and hadoop file system ,difference between normal file system and hdfs ,difference between hadoop and spark ,difference between hadoop and spark cluster ,difference between hadoop spark and hive ,difference between hadoop mapreduce and spark ,difference between hadoop and spark architecture ,difference between hadoop and spark and hive ,diff between hadoop and spark ,differences between hadoop and spark ,difference between hadoop and apache spark ,difference between hadoop developer and spark developer ,difference between hadoop and spark is ,difference between hadoop and spark and kafka ,difference between hadoop and spark quora ,the difference between hadoop and spark ,diff between spark and hadoop ,what is the difference between hdfs and yarn ,difference between yarn and hdfs ,difference between dfs and fs in hadoop ,difference hadoop fs and hdfs dfs ,what is the difference between hadoop fs and hdfs dfs ,difference between hdfs dfs and hadoop ,difference between dfs and file server