DevOps | Cloud | Analytics | Open Source | Programming





Understand Spark Execution Modes - Local, Client & Cluster Modes



It is important to understand Spark Execution Modes - Local, Client & Cluster Modes . This is a Simple yet very crucial aspect to understand from a Big Data system point of view. In this post , I will try to explain these in very simple & easy terms.

1. Local Mode:

  • This is like running a program on someone's laptop or desktop using a single JVM.
  • It can be a program of any language - java, scala , python etc.
  • But in these programs , you should have defined & used spark context object,  imported spark libraries and processed data from your local system files.
  • This is exactly the Local Mode - since everything runs LOCALLY and there is no concept of NODE involved and nothing runs in DISTRIBUTED mode.
  • The driver & the executor is created inside a single JVM process.
e.g When you launch a spark-shell in your laptop, that is a Local model of execution.

2. Client Mode :

Consider a Spark Cluster with 5 Executors.

  • In Client mode, Driver is started in the Local machine\laptop\Desktop i.e. Driver is outside of the Cluster.
  • But the Executors will be running inside the Cluster.
  • Hence Layman terms , Driver is a like a Client to the Cluster.
  • Please note in this case your entire application is dependent in the Local machine since the Driver resides in here. In case of any issue in the local machine , the driver will go off . Subsequently entire application will go off. Hence this mode is not suitable for Production use cases.
  • However it is good for Debugging or Testing since you can throw the outputs on the Driver Terminal which is a Local machine.

3. Cluster Mode :

Consider a Spark Cluster with 5 Executors.

  • In Cluster Mode, the Driver & Executor both runs inside the Cluster.
  • You submit the spark job from your local machine to a Cluster machine inside the Cluster (Such machines are usually called Edge Node).
  • You can shut down the local machine . It will not impact anything since now the Driver is running inside the Cluster.
  • This is the approach used in Production use cases.
  I hope this post helps .   Additional Posts you might like to Read -