How To Take Hive Backup & Disaster Recovery In Production Cluster ?

In this post , we will explore How To Take Apache Hive Backup & Disaster Recovery in Production Cluster .

Scenario

Lets assume we have Two existing clusters - M1 & B1 .
M1 is Main or Production Cluster.
B1 is the Backup cluster.
Our Goal is to take Backup of Hive from M1 to B1.
We are using Cloudera Cluster Manager for taking the Hive backup

Assumptions

Both Main & Backup Clusters are running.
Both clusters have Hive services installed.
ONLY Valid for Cloudera Enterprise Edition

Prerequisites

Enable Hive snapshot in the Main Cluster M1. See below -

Lets start the Step by Step process.

Step1 - Add Peer Cluster

Please Note - We Always Use the Production\Main\Master Cluster to the Backup Cluster as explained below - NOT the other way round.

Open Cloudera Manager of Backup B1 Cluster

Add Peer. Fill-in details
- Peer Name - The Prod Cluster Name M1 in our case. YOu can give any name.
- Peer URL - Private URL of the M1 Cluster .Use below command to find
  - $ hostname -f
- Peer Admin - M1 Cloudera Manager Admin Userr id
- Peer Admin Password - Password of above id
Check the Connectivity . Use the Action Tab at the Middle Right side of the Page.
Select "Test onnnectivity"

Step2 - Create Replication Schedule

Assuming the above steps went fine, Lets create the Replications.

Select the Replication Schedule Option as shown below -

Select Hive Replication

Fill-in the Details . The Backup will be done from the Source to the Destination. You also check the Resources & Advanced Tab to set additional parameters if needed. After filling in the details , Save Schedule.
- If Remove the "Replicate All" flag , you can give Specific Database name & Table name as well.

Once schedule is saved , you can see it listed. Select Run Now to Start the Backup process.

Step3 - Verification

Next step is to verify the Hive database s copied from Source M1 Cluster to Target B1 Cluster. First check any table data in M1 cluster before we copy them to B1 Cluster

Login to M1 through Shell.
Run below commands step by step.
Open Hive prompt


$ hive

hive > show databases;

hive > use DATABASE\_NAME\_FROM\_ABOVE

hive > show tables;

hive > select \* from TABLE\_NAME LIMIT 10 ;

We have checked the table data in Source or Main Cluster M1. Lets verify if the same Hive table data is copied to B1 Cluster.

Login to B1 through Shell.
Run below commands step by step.
Open Hive prompt


$ hive

hive > show databases;

hive > use DATABASE\_NAME\_FROM\_ABOVE

hive > show tables;

hive > select \* from TABLE\_NAME LIMIT 10 ;

Disaster Recovery

So far we have copied or backed up the Prod or Main Cluster M1 Hive database to B1 Cluster. So B1 Cluster becomes a Recovery copy of M1 Hive database. When you want to Recover the Hive database from the B1 Cluster again back to Prod or Main Cluster M1 , just follow Step 1 - Step3. The only difference this time will be - we are copying Hive database from B1 Cluster to M1 Cluster. Which means in this case B1 Cluster is our Source & M1 Cluster is our Destination. I hope this post was helpful. Additional Read if you are interested -

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How To Take Hive Backup & Disaster Recovery In Production Cluster ?

Scenario

Assumptions

Prerequisites

Step1 - Add Peer Cluster

Step2 - Create Replication Schedule

Step3 - Verification

Disaster Recovery

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)

DevOps | Cloud | Cyber Security | Web-Dev | Analytics | Open Source

How To Take Hive Backup & Disaster Recovery In Production Cluster ?

Scenario

Assumptions

Prerequisites

Step1 - Add Peer Cluster

Step2 - Create Replication Schedule

Step3 - Verification

Disaster Recovery

Popular Articles

Apply Pod Security Standards To Kubernetes Cluster

Indentation Problem Fix in Python

Most Important Metrics To Monitor In Kafka

Data Skewness in Spark (Salting Method)

Unicode Encode Error in Python (Ascii Codec Encode)