DevOps | Cloud | Analytics | Open Source | Programming





How To Take Hive Backup & Disaster Recovery In Production Cluster ?



In this post , we will explore How To Take Apache Hive Backup & Disaster Recovery in Production Cluster .  

Scenario


  • Lets assume we have Two existing clusters - M1 & B1 .
  • M1 is Main or Production Cluster.
  • B1 is the Backup cluster.
  • Our Goal is to take Backup of Hive from M1 to B1.
  • We are using Cloudera Cluster Manager for taking the Hive backup
 

Assumptions


  • Both Main & Backup Clusters are running.
  • Both clusters have Hive services installed.
  • ONLY Valid for Cloudera Enterprise Edition
 

Prerequisites


  • Enable Hive snapshot in the Main Cluster M1. See below -
    Lets start the Step by Step process.  

Step1 - Add Peer Cluster


Please Note - We Always Use the Production\Main\Master Cluster to the Backup Cluster as explained below - NOT the other way round.

  • Open Cloudera Manager of Backup B1 Cluster
 

  • Add Peer. Fill-in details
    • Peer Name - The Prod Cluster Name M1 in our case. YOu can give any name.
    • Peer URL - Private URL of the M1 Cluster .Use below command to find
      • $ hostname -f
    • Peer Admin - M1 Cloudera Manager Admin Userr id
    • Peer Admin Password - Password of above id
  • Check the Connectivity . Use the Action Tab at the Middle Right side of the Page.
  • Select "Test onnnectivity"
 

Step2 - Create Replication Schedule


Assuming the above steps went fine, Lets create the Replications.

  • Select the Replication Schedule Option as shown below -
     

  • Select Hive Replication

  • Fill-in the Details . The Backup will be done from the Source to the Destination. You also check the  Resources & Advanced Tab to set additional parameters if needed. After filling in the details , Save Schedule.
    • If Remove the "Replicate All" flag , you can give Specific Database name & Table name as well.

  • Once schedule is saved , you can see it listed. Select Run Now to Start the Backup process.
 

Step3 - Verification


Next step is to verify the Hive database s copied from Source M1 Cluster to Target B1 Cluster. First check any table data in M1 cluster before we copy them to B1 Cluster

  • Login to M1 through Shell.
  • Run below commands step by step.
  • Open Hive prompt

$ hive

hive > show databases;

hive > use DATABASE\_NAME\_FROM\_ABOVE

hive > show tables;

hive > select \* from TABLE\_NAME LIMIT 10 ;

We have checked the table data in Source or Main Cluster M1. Lets verify if the same Hive table data is copied to B1 Cluster.

  • Login to B1 through Shell.
  • Run below commands step by step.
  • Open Hive prompt

$ hive

hive > show databases;

hive > use DATABASE\_NAME\_FROM\_ABOVE

hive > show tables;

hive > select \* from TABLE\_NAME LIMIT 10 ;

   

Disaster Recovery


So far we have copied or backed up the Prod or Main Cluster M1 Hive database to B1 Cluster. So B1 Cluster becomes a Recovery copy of M1 Hive database. When you want to Recover the Hive database from the B1 Cluster again back to Prod or Main Cluster M1 , just follow Step 1 - Step3. The only difference this time will be - we are copying Hive database from B1 Cluster to M1 Cluster. Which means in this case B1 Cluster is our Source & M1 Cluster is our Destination. I hope this post was helpful.   Additional Read if you are interested -