Updated on 2024-11-29 GMT+08:00

Restoring ClickHouse Service Data

Scenario

ClickHouse data needs to be restored in the following scenarios: Data is modified or deleted unexpectedly and needs to be restored. After a user performs major operations (such as upgrade and migration) on ClickHouse, an exception occurs or the expected result is not achieved. All modules are faulty and become unavailable. Data is migrated to a new cluster.

You can create a restoration task on FusionInsight Manager to restore ClickHouse data. Only manual restoration tasks are supported.

The ClickHouse backup and restoration functions cannot identify the service and structure relationships of objects such as ClickHouse tables, indexes, and views. When executing backup and restoration tasks, you need to manage unified restoration points based on service scenarios to ensure proper service running.

  • Data can be restored only when the system version during data backup is the same as the current system version.
  • To restore the data when services are normal, manually back up the latest management data first and then restore the data. Otherwise, the ClickHouse data that is generated after the data backup and before the data restoration will be lost.

Impact on the System

  • During data restoration, user authentication stops and users cannot create new connections.
  • After the data is restored, the data generated after the data backup and before the data restoration is lost.
  • After the data is restored, the ClickHouse upper-layer applications need to be started.

Prerequisites

  • You have prepared a standby cluster if you need to restore data remotely from HDFS. If the active and standby clusters are deployed in security mode and they are not managed by the same FusionInsight Manager, mutual trust must be configured. For details, see Configuring Cross-Manager Mutual Trust Between Clusters. If the active and standby clusters are deployed in normal mode, no mutual trust is required.
  • The time on the active and standby clusters must be the same, and the NTP service on the active and standby clusters uses the same time source.
  • The database for storing restored data tables, the location for storing the data tables in HDFS, and the list of users who can access the restored data have been planned.
  • The ClickHouse backup file save path is correct.
  • The ClickHouse upper-layer applications are stopped.
  • You have logged in to FusionInsight Manager. For details, see Logging In to FusionInsight Manager.
  • In the active/standby cluster, when restoring data from the remote HDFS to the local host, ensure that the value of HADOOP_RPC_PROTECTION of ClickHouse is the same as that of hadoop.rpc.protection of HDFS.

Procedure

  1. On FusionInsight Manager, choose O&M > Backup and Restoration > Backup Management.
  2. In the row where the specified backup task is located, choose More > View History in the Operation column to display the historical execution records of the backup task.

    In the window that is displayed, select a success record and click View in the Backup Path column to view its backup path information and find the following information:

    • Backup Object: indicates the backup data source.
    • Backup Path: indicates the full path for storing backup files.

      Select the correct path and copy the full path of backup files in Backup Path.

  3. On FusionInsight Manager, choose O&M > Backup and Restoration > Restoration Management.
  4. Click Create.
  5. Set Task Name to the name of the restoration task.
  6. Select the desired cluster from Recovery Object.
  7. In Restoration Configuration, select ClickHouse under Service data.
  8. Set Path Type of ClickHouse to a restoration directory type.

    Table 1 Path for data restoration

    Directory Type

    Description

    RemoteHDFS

    • Indicates that the backup files are stored in the HDFS directory of the standby cluster. If you select this option, you also need to configure the following parameters:
      • Source NameService Name: indicates the NameService name of the backup data cluster, for example, hacluster. You can obtain it from the NameService Management page of HDFS of the standby cluster.
      • IP Mode: indicates the mode of the target IP address. The system automatically selects an IP address mode based on the cluster network type, for example, IPv4 or IPv6.
      • Source Active NameNode IP Address: indicates the service plane IP address of the active NameNode in the standby cluster.
      • Source Standby NameNode IP Address: indicates the service plane IP address of the standby NameNode in the standby cluster.
      • Source NameNode RPC Port: indicates the value of dfs.namenode.rpc.port in the HDFS basic configuration of the destination cluster.
      • Source Path: indicates the full path of the HDFS directory for storing backup data of the standby cluster. For details, see Backup Path obtained in 2. for example, Backup path/Backup task name_Data source_Task creation time/.

    OBS

    Indicates that data is restored from OBS.

    If you select this option, you also need to configure the following parameters:

    • Source Path: indicates the full OBS path of a backup file, for example, Backup path/Backup task name_Data source_Task creation time/Version_Data source_Task execution time.tar.gz.

  9. Click OK.
  10. In the restoration task list, locate a created task and click Start in the Operation column to execute the restoration task.

    • After the restoration is successful, the progress bar is in green.
    • After the restoration is successful, the restoration task cannot be re-executed.
    • If the restoration task fails during the first execution, rectify the fault and click Retry to re-execute the task.