Updated on 2024-10-25 GMT+08:00

Restoring Doris Data

Doris can back up current data as files to a remote storage system through Broker. Run the restoration command to restore data from the remote storage system to any Doris cluster. Doris data can be periodically backed up using snapshots and migrated.

  • Currently, only users with the ADMIN permission can perform backup and restoration operations.
  • Only one recovery job can be being executed in a database.
  • Doris data restoration supports operations at the minimum partition level. When a table contains a large amount of data, you are advised to perform operations by partition to reduce the cost of retry upon failure.
  • The backup and restoration operations are performed on the actual data files. Therefore, when a table has too many shards or a shard has too many minor versions, it may take a long time to restore the table even if the total data volume is small.
  • When you run the SHOW BACKUP or SHOW RESTORE command to view the job status, error information may be displayed in the TaskErrMsg column. If the value in the State column is not CANCELLED, the job continues. These tasks may be retried successfully. However, some tasks are incorrect, causing job failures.
  • If the recovery job is an overwrite operation (specifying that data is restored to an existing table or partition), the overwritten data in the current cluster may not be restored from the COMMIT phase of the recovery job. If the recovery job fails or is canceled, the previous data may be damaged and cannot be accessed. In this case, you need to perform the restoration operation again and wait until the job is complete. Therefore, you are not advised to restore data by overwriting data unless you confirm that the current data is no longer used.

Data Restoration Principles

To restore Doris data, you need to specify an existing backup data in a remote repository and then restore the backup data to the local cluster. After a restore request is submitted, the system performs the following operations:

  1. Create the corresponding metadata locally.

    Structures such as table partitions are created and restored in the local cluster. After the table is created, it is visible but inaccessible.

  2. Local snapshot

    Create a snapshot for the table created in the local cluster. The snapshot is an empty snapshot (the newly created table has no data). The snapshot is used to generate the corresponding snapshot directory on the Backend and receive the snapshot file downloaded from the remote repository.

  3. Downloading a snapshot

    Snapshot files in the remote repository are downloaded to the corresponding snapshot directory and concurrently completed by each backend.

  4. Making a Snapshot Take Effect

    After the snapshot is downloaded, map each snapshot to the metadata of the current local table. Then reload these snapshots for them to take effect and complete the final recovery job.

Prerequisite

  • A cluster containing the Doris service has been created, and all services in the cluster are running properly.
  • The node to be connected to the Doris database can communicate with the MRS cluster.
  • The MySQL client has been installed. For details about related operations, see Using the MySQL Client to Connect to Doris
  • Create a user with the Doris management permission.
    • Kerberos authentication is enabled for the cluster (the cluster is in security mode)

      On FusionInsight Manager, create a human-machine user, for example, dorisuser, create a role with the Doris administrator rights, and bind the role to the user.

      Log in to FusionInsight Manager as the new user dorisuser and change the initial password of the user.

    • Kerberos authentication is disabled for the cluster (the cluster is in normal mode)

      After connecting to Doris as user admin, create a role with administrator permissions, and bind the role to the user.

  • You have backed up the Doris table or partition data to be restored by referring to Backing Up Doris Data.

Restoring Doris Data

  1. Log in to the node where MySQL is installed and run the following command to connect to the Doris database:

    If Kerberos authentication is enabled for the cluster (the cluster is in security mode), run the following command to connect to the Doris database:

    export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1

    mysql -uDatabase login username -pDatabase login password -PDatabase connection port -hIP address of the Doris FE instance

    • The database connection port is the query connection port of the Doris FE. You can log in to FusionInsight Manager, choose Cluster > Services > Doris > Configurations, and query the value of query_port of the Doris service.
    • To obtain the IP address of the Doris FE instance, log in to FusionInsight Manager of the MRS cluster and choose Cluster > Services > Doris > Instances to view the IP address of any FE instance.
    • You can also use the MySQL connection software or Doris WebUI to connect to the database.

  2. Restore table or partition data from the remote warehouse where data has been backed up. For example:

    • Run the following command to restore the example_tbl table in snapshot_label1 from example_repo to the example_db2 database. The table is restored to one backup.

      RESTORE SNAPSHOT example_db2.`snapshot_label1`

      FROM `example_repo`

      ON ( `example_tbl` )

      PROPERTIES

      (

      "backup_timestamp"="2023-08-16-20-13-55",

      "replication_num" = "1"

      );

      You can run the SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1"; command to obtain the value of backup_timestamp.

    • Restore the partitions p1 and p2 of the example_tbl table in the backup snapshot_label2 from example_repo, restore the example_tbl2 table to the example_db1 database, rename the table new_tbl. By default, three copies are restored.

      RESTORE SNAPSHOT example_db1.`snapshot_2`

      FROM `example_repo`

      ON

      (

      `backup_tbl` PARTITION (`p1`, `p2`),

      `backup_tbl2` AS `new_tbl`

      )

      PROPERTIES

      (

      "backup_timestamp"="2023-08-16-20-13-55"

      );

      Note: You can run the SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label2"; command to obtain the value of backup_timestamp.

  3. Run the following command to check the execution status of the recovery job:

    SHOW RESTORE\G;