DevOps | Cloud | Analytics | Open Source | Programming

How to Create a Multi Node Hadoop\Spark Cluster in Google Cloud(GCP) ?

This post explains How to Create a Multi Node Hadoop Spark Cluster in Google Cloud GCP . We will Create a 4 Node Cluster - 1-Master Node & 3-Worker Node in Google Cloud(GCP) . Lets start step by step.

Step1 - Create a Google Cloud Account

  • If you already have a Gmail account , you are good to Go
  • Go to -
  • Login using your google account or gmail account.
  • You will need your Credit Card details. But don't worry , you are not charged anything till your trial period of 1 year or the Trial credit is exhasuted. But either ways , google will prompt and intimate you about that. (Please note after your credit details are entered , Google charges $1 and immediately cancels the transaction. This is done to ensure your card is valid and active.)
  • Fill - in your details . It is a fairly starighforward process.

Step2 - Open Google Cloud Platform

  • Once your are done with Step1 , when you login to , you will land on Google cloud Dashboard. it looks like below. Note the Free Credit Information at the Top (indicated by arrow mark)

Step3 - Open Dataproc

  • Select Dataproc from the Dashboard
  • Then select Clusters shown below

Step4 - Fill in the Cluster Details

Fill in the Cluster details .  We are going to create a 4-Node cluster with 1-Master Node & 3-Worker Node. Please refer the screenshots. Major points to take note of are mentioned below -

  • Cluster zone info
  • Master Node & Worker Nodes
  • Master Node info (Disk size, vCPU & RAM)
  • Worker Node info (Disk size, vCPU & RAM)
  • Cloud Dataproc Image (By default its the latest but you can choose older versions .
Version details are in 

  • Once details are filled in , click "CREATE"

Step5 - New Cluster Dashboard

Once all the Step1 - Step4 are done, you will land on the New Cluster Dashboard. Or when you login to Google Cloud next time , when you click on "Dataproc" , you will find the newly created cluster listed there.  

Step6 - Access the VMs

  • When you click on the Cluster name in above screen (spark-cluster1 in the our case) , it will open next screen where you can see all the 4 VM instances.

  • Click on the SSH to open the Terminal.

  • Alternatively , you Go to the Compute Engine section which lists all VMs created by you. Click "Compute Engine" and then "VM instances" as shown below -
  Hope you find this post helpful.  

Additional Posts from this Blog -