DevOps | Cloud | Analytics | Open Source | Programming





How to Create a Multi Node Hadoop\Spark Cluster in Google Cloud(GCP) ?



This post explains How to Create a Multi Node Hadoop Spark Cluster in Google Cloud GCP . We will Create a 4 Node Cluster - 1-Master Node & 3-Worker Node in Google Cloud(GCP) . Lets start step by step.

Step1 - Create a Google Cloud Account


  • If you already have a Gmail account , you are good to Go
  • Go to - https://cloud.google.com/
  • Login using your google account or gmail account.
  • You will need your Credit Card details. But don't worry , you are not charged anything till your trial period of 1 year or the Trial credit is exhasuted. But either ways , google will prompt and intimate you about that. (Please note after your credit details are entered , Google charges $1 and immediately cancels the transaction. This is done to ensure your card is valid and active.)
  • Fill - in your details . It is a fairly starighforward process.
 

Step2 - Open Google Cloud Platform


  • Once your are done with Step1 , when you login to https://cloud.google.com/ , you will land on Google cloud Dashboard. it looks like below. Note the Free Credit Information at the Top (indicated by arrow mark)
 

Step3 - Open Dataproc


  • Select Dataproc from the Dashboard
  • Then select Clusters shown below
 

Step4 - Fill in the Cluster Details


Fill in the Cluster details .  We are going to create a 4-Node cluster with 1-Master Node & 3-Worker Node. Please refer the screenshots. Major points to take note of are mentioned below -

  • Cluster zone info
  • Master Node & Worker Nodes
  • Master Node info (Disk size, vCPU & RAM)
  • Worker Node info (Disk size, vCPU & RAM)
  • Cloud Dataproc Image (By default its the latest but you can choose older versions .
Version details are in https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions 

  • Once details are filled in , click "CREATE"
   

Step5 - New Cluster Dashboard


Once all the Step1 - Step4 are done, you will land on the New Cluster Dashboard. Or when you login to Google Cloud next time , when you click on "Dataproc" , you will find the newly created cluster listed there.  

Step6 - Access the VMs


  • When you click on the Cluster name in above screen (spark-cluster1 in the our case) , it will open next screen where you can see all the 4 VM instances.
 

  • Click on the SSH to open the Terminal.
 

  • Alternatively , you Go to the Compute Engine section which lists all VMs created by you. Click "Compute Engine" and then "VM instances" as shown below -
  Hope you find this post helpful.  

Additional Posts from this Blog -