Backing Up and Restoring MRS Cluster Data

Overview

Manager can back up system and user data by components. The system can back up Manager data, component metadata, and service data.

For MRS 3.x and later, data can be backed up to local disks (LocalDir), local HDFS (LocalHDFS), remote HDFS (RemoteHDFS), NAS (NFS/CIFS), SFTP servers (SFTP), and OBS (supported in MRS 3.1.0 and later). For details, see Backing Up MRS Cluster Component Data.

Backup and restoration tasks are performed in the following scenarios:

Routine backup is performed to ensure the data security of the system and components.
If the system is faulty, the data backup can be used to recover the system.
If the active cluster is completely faulty, a mirrored cluster identical to the active cluster needs to be created. You can use the backup data to restore the active cluster.

**Table 1** Metadata (MRS 2.x and earlier versions)
Backup Type	Backup Content
OMS	Database data (excluding alarm data) and configuration data in the cluster management system to be backed up by default
LdapServer	User information, including the username, password, key, password policy, and user group information
DBService	Metadata of the components (Hive) managed by DBService
NameNode	HDFS metadata

**Table 2** Manager configuration data (MRS 3.x and later)
Backup Type	Backup Content	Backup Directory Type
OMS	Database data (excluding alarm data) and configuration data in the cluster management system by default	LocalDir LocalHDFS RemoteHDFS NFS CIFS SFTP OBS

**Table 3** Component metadata or other data (MRS 3.x and later)
Backup Type	Backup Content	Backup Directory Type
DBService	Metadata of the components (including Loader, Hive, Spark, Oozie, CDL, and Hue) managed by DBService.	LocalDir LocalHDFS RemoteHDFS NFS CIFS SFTP OBS
Flink (Applicable to MRS 3.2.0 and later versions)	Flink metadata.	LocalDir LocalHDFS RemoteHDFS OBS (available in MRS 3.5.0 and later)
Kafka	Kafka metadata.	LocalDir LocalHDFS RemoteHDFS NFS CIFS OBS
NameNode	HDFS metadata. After multiple NameServices are added, backup and restoration are supported for all of them and the operations are consistent with those of the default hacluster instance.	LocalDir RemoteHDFS NFS CIFS SFTP OBS
Yarn	Information about the Yarn service resource pool.
HBase	tableinfo files and data files of HBase system tables.
IoTDB	IoTDB metadata.	LocalDir NFS RemoteHDFS CIFS SFTP
ClickHouse	ClickHouse metadata.	LocalDir RemoteHDFS

**Table 4** Service data of specific components (MRS 3.x and later)
Backup Type	Backup Content	Backup Directory Type
HBase	Table-level user data.	RemoteHDFS NFS CIFS SFTP OBS (available in MRS 3.5.0 and later)
HDFS	Directories or files of user services. Encrypted directories cannot be backed up or restored.
Hive	Table-level user data.
IoTDB	IoTDB service data.	RemoteHDFS
ClickHouse	Table-level user data.	RemoteHDFS
Doris	Doris service data. This function is available for MRS 3.3.1 and later.	RemoteHDFS OBS

Note that some components in MRS 3.x and later versions do not provide data backup or restoration:

Kafka supports replicas and allows multiple replicas to be specified when a topic is created.
For MRS 3.5.0 and later, Kafka, as a message channel, does not store data permanently. By default, only data of the latest seven days is stored. Independent data backup is not supported.
CDL data is stored in DBService and Kafka. A system administrator can create DBService and Kafka backup tasks to back up data.
MapReduce and Yarn data is stored in HDFS. Therefore, they rely on the backup and restoration provided by HDFS.
Backup and restoration of service data in ZooKeeper are performed by their own upper-layer components.

MRS Cluster Data Backup and Restoration Principles

Task

Before backup or restoration, you need to create a backup or restoration task and set task parameters, such as the task name, backup data source, and type of the directory for storing backup files. Then you can execute the tasks to back up or restore data. When Manager is used to restore the data of HDFS, HBase (MRS 3.x and later), Hive, and NameNode, the cluster cannot be accessed.

Each backup task can back up data of different data sources and generate an independent backup file for each data source. All the backup files generated in a backup task form a backup file set, which can be used in restoration tasks. Backup data can be stored on Linux local disks, local cluster HDFS, and standby cluster HDFS.

For MRS 3.x and later versions, backup tasks support full backup and incremental backup policies. Cloud data backup tasks do not support incremental backup. If the backup directory type is NFS or CIFS, incremental backup is not recommended. When incremental backup is used for NFS or CIFS backup, the latest full backup data is updated each time the incremental backup is performed. Therefore, no new recovery point is generated.
For MRS 2.x and earlier versions, the backup task provides the full backup or incremental backup policies. HDFS and Hive backup tasks support the incremental backup policy, while OMS, LdapServer, DBService, and NameNode backup tasks support only the full backup policy.

Task execution rules:

If a task is already running, you cannot start it again or launch another task until it finishes.
The automatic execution interval for periodic tasks must exceed 120 seconds. If the interval is less, task execution is postponed until the subsequent period. Manual tasks can be executed at any interval.
During automatic execution of a periodic task, the current time should not be more than 120 seconds later than the task's start time. If this condition is violated, task execution is postponed until the subsequent period.
When a periodic task is locked, it cannot run automatically. You need to manually unlock it before it can start.
The LocalBackup partition on the active management node must have at least 20 GB of free space to start the backup tasks for OMS, LdapServer (MRS 2.x or earlier), DBService, Kafka (MRS 3.x or later), and NameNode.
When planning backup and restoration tasks, select the data to be backed up or restored strictly based on the service logic, data store structure, and database or table associations.
- For MRS 2.x and earlier versions, the system automatically creates the default periodic backup task default with a 24-hour execution interval. This task performs full backup of OMS, LdapServer, DBService, and NameNode data to the Linux local disk.
- For MRS 3.x and later versions, the system automatically creates default periodic backup tasks default-oms and default-cluster ID with a 1-hour execution interval. These tasks perform full backup of OMS metadata and cluster metadata, including DBService and NameNode, to local disks.

Snapshot (MRS 3.x and later versions)

The system uses the snapshot technology to quickly back up data. Snapshots include HBase and HDFS snapshots.

HBase snapshots
An HBase snapshot is a backup file of HBase tables at a specified time point. This backup file does not replicate service data or affect the RegionServer. The HBase snapshot replicates table metadata, including table descriptor, region info, and HFile reference information. The metadata can be used to restore data before the snapshot creation time.
HDFS snapshots
An HDFS snapshot is a read-only backup of HDFS at a specified time point. The snapshot is used in data backup, misoperation protection, and disaster recovery scenarios.

The snapshot function can be enabled for any HDFS directory to create the related snapshot file. Before creating a snapshot for a directory, the system automatically enables the snapshot function for the directory. Creating a snapshot does not affect any HDFS operation. A maximum of 65,536 snapshots can be created for each HDFS directory.

If a snapshot has been created for an HDFS directory, the directory cannot be deleted or modified until the snapshot is removed. Snapshots cannot be created for the upper-layer directories or subdirectories of the directory.

DistCp (MRS 3.x and later versions)

Distributed copy (DistCp) is a tool used to replicate a large amount of data in HDFS in a cluster or between the HDFSs of different clusters. In a backup or restoration task of HBase, HDFS, or Hive, if you back up the data to HDFS of the standby cluster, the system invokes DistCp to perform the operation. Install the MRS software of the same version for the active and standby clusters and install the cluster.

DistCp uses MapReduce to implement data distribution, troubleshooting, restoration, and report. DistCp specifies different Map jobs for various source files and directories in the specified list. Each Map job copies the data in the partition that corresponds to the specified file in the list.

If you use DistCp to replicate data between HDFSs of two clusters, configure the cross-cluster mutual trust (mutual trust does not need to be configured for clusters managed by the same FusionInsight Manager) and cross-cluster replication for both clusters. When backing up the cluster data to HDFS in another cluster, you need to install the Yarn component. Otherwise, the backup fails.

Local Fast Restoration (MRS 3.x and later versions)

After using DistCp to back up the HBase, HDFS, and Hive data of the local cluster to the HDFS of the standby cluster, the HDFS of the local cluster retains the backup data snapshots. You can create local rapid restoration tasks to restore data by using the snapshot files in the HDFS of the local cluster.

NAS (MRS 3.x and later versions)

Network Attached Storage (NAS) is a dedicated data storage server which includes the storage components and embedded system software. It provides the cross-platform file sharing function. By using NFS (supporting NFSv3 and NFSv4) and CIFS (supporting SMBv2 and SMBv3), you can connect the service plane of MRS to the NAS server to back up data to the NAS or restore data from the NAS.

Before data is backed up to the NAS, the system automatically mounts the NAS shared address to a local partition on the backup task execution node. After the backup is complete, the system unmounts the NAS shared partition from the backup task execution node.
To prevent backup and restoration failures, do not access the shared address where the NAS server has been mounted to, for example, /srv/BigData/LocalBackup/nas, during data backup and restoration.
DistCp is used to back up service data to the NAS.

Specifications of MRS Cluster Data Backup and Restoration

**Table 5** Specifications of the backup and restoration feature
Item	Specification
Maximum number of backup or restoration tasks	100
Number of concurrent tasks in a cluster	1
Maximum number of waiting tasks	199
Maximum size (GB) of backup files on a Linux local disk	600

In MRS 3.x and later versions, when service data is stored in ZooKeeper upper-layer components, do not include too many znodes in a single backup or restoration task. Excessive znodes may result in task failure and adversely affect ZooKeeper service performance. To check the number of znodes in a single backup or restoration task, perform the following operations:

Ensure that the number of znodes included in any backup or restoration task remains below the operating system's maximum file handle threshold. Methods for checking the maximum file handle threshold are as follows:
1. To check the system-level threshold, run the cat /proc/sys/fs/file-max command.
2. To check the user-level threshold, run the ulimit -n command.
When the number of znodes in a parent directory surpasses the defined threshold, data backup and restoration shall be executed in batches across its sub-directories. To check the number of znodes using ZooKeeper client scripts, perform the following operations:
1. On the FusionInsight Manager homepage, choose Cluster > Services > ZooKeeper. Click Instances and check the management IP address of each ZooKeeper role instance.
2. Log in to the node where the client is installed, configure environment variables, authenticate the user (skip this operation for clusters with Kerberos authentication disabled), and run the following command:
  zkCli.sh -server ip:port
  
  ip can be any management IP address. The default value of port is 2181.
3. If the following information is displayed, login to the ZooKeeper server is successful:
```
WatchedEvent state:SyncConnected type:None path:null
[zk: ip:port(CONNECTED) 0]
```
4. Run the getusage command to check the number of znodes in the directory to be backed up.
  getusage /hbase/region
  
  In the command output, Node count=xxxxxx indicates the number of znodes stored in the region directory.

**Table 6** Specifications of the **default** task (MRS 2.x and earlier)
Item	OMS	LdapServer	DBService	NameNode
Backup period	1 hour
Maximum number of backup copies	2
Maximum size of a backup file	10 MB	20 MB	100 MB	1.5 GB
Maximum size of disk space used	20 MB	40 MB	200 MB	3 GB
Save path of backup data	Data save path/LocalBackup/ of the active and standby management nodes

**Table 7** Specifications of the **default** task (MRS 3.x and later)
Item	OMS	HBase	Kafka	DBService	NameNode
Backup period	1 hour
Maximum number of backups	168 (7-day historical data)				24 (one-day historical data)
Maximum size of a backup file	10 MB	10 MB	512 MB	100 MB	20 GB
Maximum size of disk space used	1.64 GB	1.64 GB	84 GB	16.41 GB	480 GB
Storage path of backup data	Data storage path/LocalBackup/ of the active and standby management nodes

The backup data generated by the default backup task must be periodically transferred and preserved outside the cluster, as mandated by enterprise O&M requirements.
In MRS 3.x and later, administrators can create DistCp backup tasks to save OMS, DBService, and NameNode data to external clusters.
The execution time of a cluster data backup task is calculated as the volume of data to be backed up divided by the network bandwidth between the cluster and the backup device. For practical estimation, the calculated duration should be multiplied by 1.5 to obtain the reference execution time.
Executing a data backup task affects the maximum I/O performance of the cluster. Therefore, you are advised to execute a backup task during off-peak hours.

Parent Topic: MRS Cluster Data Backup and Restoration

Previous topic: MRS Cluster Data Backup and Restoration

Next topic: Enabling MRS Inter-Cluster Replication

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.