How to Create a Multi Node Hadoop\Spark Cluster in Google Cloud(GCP) ?
This post explains How to Create a Multi Node Hadoop Spark Cluster in Google Cloud GCP . We will Create a 4 Node Cluster - 1-Master Node & 3-Worker Node in Google Cloud(GCP) . Lets start step by step.
Step1 - Create a Google Cloud Account
- If you already have a Gmail account , you are good to Go
- Go to - https://cloud.google.com/
- Login using your google account or gmail account.
- You will need your Credit Card details. But don't worry , you are not charged anything till your trial period of 1 year or the Trial credit is exhasuted. But either ways , google will prompt and intimate you about that. (Please note after your credit details are entered , Google charges $1 and immediately cancels the transaction. This is done to ensure your card is valid and active.)
- Fill - in your details . It is a fairly starighforward process.
- Once your are done with Step1 , when you login to https://cloud.google.com/ , you will land on Google cloud Dashboard. it looks like below. Note the Free Credit Information at the Top (indicated by arrow mark)
Step3 - Open Dataproc
- Select Dataproc from the Dashboard
- Then select Clusters shown below
Step4 - Fill in the Cluster Details
Fill in the Cluster details . We are going to create a 4-Node cluster with 1-Master Node & 3-Worker Node. Please refer the screenshots. Major points to take note of are mentioned below -
- Cluster zone info
- Master Node & Worker Nodes
- Master Node info (Disk size, vCPU & RAM)
- Worker Node info (Disk size, vCPU & RAM)
- Cloud Dataproc Image (By default its the latest but you can choose older versions .
Version details are in
https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions
- Once details are filled in , click "CREATE"
Step5 - New Cluster Dashboard
Once all the Step1 - Step4 are done, you will land on the New Cluster Dashboard. Or when you login to Google Cloud next time , when you click on "Dataproc" , you will find the newly created cluster listed there.
Step6 - Access the VMs
- When you click on the Cluster name in above screen (spark-cluster1 in the our case) , it will open next screen where you can see all the 4 VM instances.
- Click on the SSH to open the Terminal.
- Alternatively , you Go to the Compute Engine section which lists all VMs created by you. Click "Compute Engine" and then "VM instances" as shown below -
Hope you find this post helpful.
Additional Posts from this Blog -