Updated on 2024-11-29 GMT+08:00

Restoring Doris Data

Doris data can be backed up in form of files to a remote storage system through Broker. Then, you can run the restoration command to restore data from the remote storage system to any Doris cluster. You can periodically back up and migrated data using snapshots.

  • Currently, only users with the ADMIN permission can perform backup and restoration operations.
  • There can be only one restoration job that is being executed in a database.
  • Doris data can be restored with smallest partition granularity. If a table contains a large amount of data, you can restore data by partition to reduce the retry cost upon failure.
  • The backup and restoration operations are performed on the actual data files. When a table has too many shards, or a shard has too many small versions, it may take a long time to restore data even if the total amount of data is small.
  • You can run the SHOW BACKUP or SHOW RESTORE command to view the job status. An error information may be displayed in the TaskErrMsg column. If the value in the State column is not CANCELLED, the job continues. Some tasks may be retried successfully. However, some tasks are in error, causing job failures.
  • If a restoration job overwrites data (restoring data to an existing table or partition), the overwritten data in the cluster may not be restored since the COMMIT phase of the restoration job. If a restoration job fails or is canceled, the overwritten data may be damaged and cannot be accessed. In this case, you need to perform the restoration operation again and wait until the job is complete. You are not advised to restore data by overwriting data unless you confirm that the data is no longer used.

Data Restoration Principles

To restore Doris data, you need to specify an existing backup in a remote repository and then restore the backup data to the local cluster. After a restore request is submitted, the system performs the following operations:

  1. Create the corresponding metadata locally.

    This step will first create and restore the corresponding table partition and other structures in the local cluster. After creation, the table is visible, but not accessible.

  2. Create a local snapshot

    This step is to take a snapshot of the table created in the previous step. This is actually an empty snapshot (the table just created has no data), and its purpose is to generate the corresponding snapshot directory on the Backend for receiving the snapshot file downloaded from the remote warehouse.

  3. Download a snapshot

    The snapshot files in the remote warehouse will be downloaded to the corresponding snapshot directory generated in the previous step. This step is done concurrently by each Backend.

  4. Make a snapshot take effect

    After the snapshot download is complete, you need to map each snapshot to the metadata of the current local table. These snapshots are then reloaded to take effect, completing the final recovery job.

Prerequisite

  • A cluster containing the Doris service has been created, and all services in the cluster are running properly.
  • The node to be connected to the Doris database can communicate with the MRS cluster.
  • The MySQL client has been installed. For details, see Installing a MySQL Client.
  • Create a user with the Doris management permission.
    • Kerberos authentication is enabled for the cluster (the cluster is in security mode)

      On FusionInsight Manager, create a human-machine user, for example, dorisuser, create a role with Doris administrator permission, and bind the role to the user.

      Log in to FusionInsight Manager as the new user dorisuser and change the initial password.

    • Kerberos authentication is disabled for the cluster (the cluster is in normal mode)

      After connecting to Doris as user admin, create a role with administrator permissions, and bind the role to the user.

Restoring Doris Data

  1. Log in to the node where MySQL is installed and connect the Doris database.

    If Kerberos authentication is enabled for the cluster (the cluster is in security mode), run the following command to connect to the Doris database:

    export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1

    mysql -uDatabase login username -pDatabase login password -PConnection port for FE queries -hIP address of the Doris FE instance

    • To obtain the query connection port of the Doris FE instance, you can log in to FusionInsight Manager, choose Cluster > Services > Doris > Configurations, and query the value of query_port of the Doris service.
    • To obtain the IP address of the Doris FE instance, log in to FusionInsight Manager of the MRS cluster and choose Cluster > Services > Doris > Instances to view the IP address of any FE instance.
    • You can also use the MySQL connection software or Doris web UI to connect the database.

  2. Restore table or partition data from the remote warehouse where data has been backed up.

    • Run the following command to restore the example_tbl table in snapshot_label1 from example_repo to the example_db2 database. Set the time version to 2018-05-04-16-45-08. One copy will be restored.

      RESTORE SNAPSHOT example_db2.`snapshot_label1`

      FROM `example_repo`

      ON ( `example_tbl` )

      PROPERTIES

      (

      "backup_timestamp"="2023-08-16-20-13-55",

      "replication_num" = "1"

      );

      You can run the SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; command to obtain the value of backup_timestamp.

    • Restore partitions p1 and p2 of the example_tbl table in the backup snapshot_label2 from example_repo, restore the example_tbl2 table to the example_db1 database, rename the table new_tbl, and set the time version to 2018-05-04-17-11-01. By default, three copies will be restored.

      RESTORE SNAPSHOT example_db1.`snapshot_2`

      FROM `example_repo`

      ON

      (

      `backup_tbl` PARTITION (`p1`, `p2`),

      `backup_tbl2` AS `new_tbl`

      )

      PROPERTIES

      (

      "backup_timestamp"="2023-08-16-20-13-55"

      );

      Note: You can run the SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; command to obtain the value of backup_timestamp.

  3. Run the following command to check the execution status of the restoration job:

    SHOW RESTORE\G;