Understand Spark Execution Modes - Local, Client & Cluster Modes
It is important to understand Spark Execution Modes - Local, Client & Cluster Modes . This is a Simple yet very crucial aspect to understand from a Big Data system point of view. In this post , I will try to explain these in very simple & easy terms.
1. Local Mode:
- This is like running a program on someone's laptop or desktop using a single JVM.
- It can be a program of any language - java, scala , python etc.
- But in these programs , you should have defined & used spark context object, imported spark libraries and processed data from your local system files.
- This is exactly the Local Mode - since everything runs LOCALLY and there is no concept of NODE involved and nothing runs in DISTRIBUTED mode.
- The driver & the executor is created inside a single JVM process.
e.g When you launch a spark-shell in your laptop, that is a Local model of execution.
2. Client Mode :
Consider a Spark Cluster with 5 Executors.
- In Client mode, Driver is started in the Local machine\laptop\Desktop i.e. Driver is outside of the Cluster.
- But the Executors will be running inside the Cluster.
- Hence Layman terms , Driver is a like a Client to the Cluster.
- Please note in this case your entire application is dependent in the Local machine since the Driver resides in here. In case of any issue in the local machine , the driver will go off . Subsequently entire application will go off. Hence this mode is not suitable for Production use cases.
- However it is good for Debugging or Testing since you can throw the outputs on the Driver Terminal which is a Local machine.
3. Cluster Mode :
Consider a Spark Cluster with 5 Executors.
- In Cluster Mode, the Driver & Executor both runs inside the Cluster.
- You submit the spark job from your local machine to a Cluster machine inside the Cluster (Such machines are usually called Edge Node).
- You can shut down the local machine . It will not impact anything since now the Driver is running inside the Cluster.
- This is the approach used in Production use cases.
I hope this post helps .
Additional Posts you might like to Read -