Restoring ClickHouse Service Data
Scenarios
ClickHouse data restoration is required in the following scenarios: when data is unexpectedly modified or deleted and requires retrieval; when major ClickHouse operations (such as upgrades or significant adjustments) cause exceptions in system data or fail to achieve the expected result; when all modules fail and become unavailable; and when data is migrated to a new cluster.
ClickHouse service data restoration tasks can be created on FusionInsight Manager. The system supports manual data restoration only.
The ClickHouse backup and restoration functions cannot identify the service and structure relationships of objects such as ClickHouse tables, indexes, and views. When executing backup and restoration tasks, you need to manage unified restoration points based on service scenarios to ensure proper service running.
MRS clusters support multiple data path types for restoring ClickHouse service data.
- RemoteHDFS: indicates that data is restored from the HDFS directory of the standby cluster.
- OBS: indicates that data is restored from OBS.
To restore data when the service is running properly, it is recommended that you manually back up the latest management data before performing data restoration. Otherwise, the ClickHouse data that is generated after the data backup and before the data restoration will be lost.
Notes and Constraints
- MRS 3.1.0 or later supports this function.
- Data restoration can be performed only when the system version is consistent with the version used during data backup.
- ClickHouse metadata restoration and service data restoration cannot be performed simultaneously. Otherwise, service data restoration fails. You are advised to restore service data only after metadata restoration is complete.
- MRS 3.3.0-LTS.1 and later versions support the storage of ClickHouse service data backup files to OBS.
Impact on the System
- During data restoration, user authentication stops and users cannot create new connections.
- After the data is restored, the data generated after the data backup and before the data restoration is lost.
- After the data is restored, the ClickHouse upper-layer applications need to be started.
Prerequisites
- If you need to restore data from a remote HDFS, a standby cluster has been created and the data has been backed up. For details, see Backing Up ClickHouse Service Data. If the active and standby clusters are deployed in security mode and they are not managed by the same FusionInsight Manager, mutual trust has been configured. For details, see Configuring Mutual Trust Between MRS Clusters. If the active and standby clusters are deployed in normal mode, mutual trust is not required.
- Time is consistent between the active and standby clusters, with the NTP services on both clusters configured to use the same time source.
- The database for storing restored data tables, the HDFS save path of data tables, and the list of users who can access restored data are planned.
- The ClickHouse backup file save path is correct.
- The ClickHouse upper-layer applications are stopped.
- In an active/standby cluster, the value of HADOOP_RPC_PROTECTION of ClickHouse must be the same as that of hadoop.rpc.protection in the HDFS when you restore data from the remote HDFS to the local host.
- If the cluster to be restored does not contain the metadata of the backup service data, you must restore the corresponding backup metadata before restoring the service data, or restore both the backup metadata and backup service data simultaneously.
- To restore backup data from another MRS ClickHouse cluster to this cluster, the following requirements must be met:
- The two clusters are of the same MRS version.
- The two clusters are in the same mode.
- The two clusters share the same ClickHouse topology, including shards and replicas.
- The number of ClickHouse disk partitions and disk capacity of the cluster to be restored must be greater than or equal to those of the backup cluster.
Restoring ClickHouse Service Data
- Log in to MRS Manager.
For details about how to log in to MRS Manager, see Accessing MRS Manager.
- Choose O&M > Backup and Restoration > Backup Management.
- In the row containing the specified backup task, choose More > View History in the Operation column to display the task's historical execution records.
In the displayed window, locate the desired success record and click View in the Backup Path column to display the task's backup path information and obtain the following details:
- Backup Object: indicates the backup data source.
- Backup Path: indicates the full path where the backup files are stored.
Locate the correct path, and manually copy the full path of the backup files from the Backup Path column.
- On FusionInsight Manager, choose O&M > Backup and Restoration > Restoration Management.
- Click Create.
- Set Task Name to the name of the restoration task.
- Select the cluster to be operated from Recovery Object.
- In Restoration Configuration, select ClickHouse under Service data.
- Select a backup directory type for Path Type of ClickHouse.
Table 1 Path for data restoration Path Type
Parameter
Description
RemoteHDFS
Source NameService Name
NameService name of the backup data cluster. You can obtain it from the NameService Management page of HDFS in the standby cluster. For example, the name is hacluster.
IP Mode
IP version of the target IP address. The system automatically determines the IP version, such as IPv4 or IPv6, based on the cluster network type.
Source Active NameNode IP Address
Service plane IP address of the active NameNode in the standby cluster. Log in to MRS Manager of the standby cluster, choose Cluster > Services > HDFS, click the Instances tab, and check the service plane IP address of the active NameNode.
This parameter is available for clusters of MRS 3.2.0 or later.
Source Standby NameNode IP Address
Service plane IP address of the standby NameNode in the standby cluster. Log in to MRS Manager of the standby cluster, choose Cluster > Services > HDFS, click the Instances tab, and check the service plane IP address of the standby NameNode.
This parameter is available for clusters of MRS 3.2.0 or later.
Source NameNode IP Address
Service plane IP address of the active or standby NameNode in the standby cluster. Log in to MRS Manager of the standby cluster, choose Cluster > Services > HDFS, click the Instances tab, and check the service plane IP address of the NameNode.
This parameter is available only for MRS 3.1.0 and MRS 3.1.2 clusters.
Source NameNode RPC Port
Value of dfs.namenode.rpc.port in the HDFS basic configuration of the destination cluster.
This parameter is available for clusters of MRS 3.2.0 or later.
Source Path
Full path of the HDFS directory storing backup data in the standby cluster. For details, see Backup Path obtained in Step 3. Path format: Backup path/Backup task name_Data source_Task creation time
Maximum Number of Maps
Maximum number of maps in a MapReduce task. The default value is 20.
This parameter is available only for MRS 3.1.0 and MRS 3.1.2 clusters.
Maximum Map Bandwidth (MB/s)
Maximum bandwidth of a map. The default value is 100.
This parameter is available only for MRS 3.1.0 and MRS 3.1.2 clusters.
OBS
Source Path
Full path of the OBS directory storing backup files. Path format: Backup path/Backup task name_Data source_Task creation time/Version_Data source_Task execution time.tar.gz
- Click OK.
- In the restoration task list, locate the row containing the created task, and click Start in the Operation column to execute the restoration task.
- After the restoration is successful, the progress bar is in green.
- After the restoration is successful, the restoration task cannot be executed again.
- If the restoration task fails during the first execution, rectify the fault and click Retry to execute the task again.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.