How To Take Hive Backup & Disaster Recovery In Production Cluster ?
In this post , we will explore How To Take Apache Hive Backup & Disaster Recovery in Production Cluster .
Scenario
- Lets assume we have Two existing clusters - M1 & B1 .
- M1 is Main or Production Cluster.
- B1 is the Backup cluster.
- Our Goal is to take Backup of Hive from M1 to B1.
- We are using Cloudera Cluster Manager for taking the Hive backup
Assumptions
- Both Main & Backup Clusters are running.
- Both clusters have Hive services installed.
- ONLY Valid for Cloudera Enterprise Edition
Prerequisites
- Enable Hive snapshot in the Main Cluster M1. See below -
Lets start the Step by Step process.
Step1 - Add Peer Cluster
Please Note - We Always Use the Production\Main\Master Cluster to the Backup Cluster as explained below - NOT the other way round.
- Open Cloudera Manager of Backup B1 Cluster
- Add Peer. Fill-in details
- Peer Name - The Prod Cluster Name M1 in our case. YOu can give any name.
- Peer URL - Private URL of the M1 Cluster .Use below command to find
- Peer Admin - M1 Cloudera Manager Admin Userr id
- Peer Admin Password - Password of above id
- Check the Connectivity . Use the Action Tab at the Middle Right side of the Page.
- Select "Test onnnectivity"
Step2 - Create Replication Schedule
Assuming the above steps went fine, Lets create the Replications.
- Select the Replication Schedule Option as shown below -
- Fill-in the Details . The Backup will be done from the Source to the Destination. You also check the Resources & Advanced Tab to set additional parameters if needed. After filling in the details , Save Schedule.
- If Remove the "Replicate All" flag , you can give Specific Database name & Table name as well.
- Once schedule is saved , you can see it listed. Select Run Now to Start the Backup process.
Step3 - Verification
Next step is to verify the Hive database s copied from Source M1 Cluster to Target B1 Cluster. First check any table data in M1 cluster before we copy them to B1 Cluster
- Login to M1 through Shell.
- Run below commands step by step.
- Open Hive prompt
$ hive
hive > show databases;
hive > use DATABASE\_NAME\_FROM\_ABOVE
hive > show tables;
hive > select \* from TABLE\_NAME LIMIT 10 ;
We have checked the table data in Source or Main Cluster M1. Lets verify if the same Hive table data is copied to B1 Cluster.
- Login to B1 through Shell.
- Run below commands step by step.
- Open Hive prompt
$ hive
hive > show databases;
hive > use DATABASE\_NAME\_FROM\_ABOVE
hive > show tables;
hive > select \* from TABLE\_NAME LIMIT 10 ;
Disaster Recovery
So far we have copied or backed up the Prod or Main Cluster M1 Hive database to B1 Cluster. So B1 Cluster becomes a Recovery copy of M1 Hive database. When you want to Recover the Hive database from the B1 Cluster again back to Prod or Main Cluster M1 , just follow Step 1 - Step3. The only difference this time will be - we are copying Hive database from B1 Cluster to M1 Cluster. Which means in this case B1 Cluster is our Source & M1 Cluster is our Destination. I hope this post was helpful.
Additional Read if you are interested -