Updated on 2024-05-29 GMT+08:00

Backing Up Doris Data

Doris can back up current data as files to a remote storage system through Broker. Doris data can be periodically backed up using snapshots and migrated.

  • Currently, only users with the ADMIN permission can perform backup and restoration operations.
  • Only one backup job can be being executed in a database.
  • Doris data backup supports operations at the minimum partition level. When a table contains a large amount of data, you are advised to perform operations by partition to reduce the cost of retry upon failure.
  • The backup and restoration operations are performed on the actual data files. Therefore, when a table has too many shards or a shard has too many minor versions, the backup may take a long time even if the total data volume is small.
  • When you run the SHOW BACKUP or SHOW RESTORE command to view the job status, error information may be displayed in the TaskErrMsg column. If the value in the State column is not CANCELLED, the job continues. These tasks may be retried successfully. However, some tasks are incorrect, causing job failures.

Data Backup Principles

The backup operation is to upload the data of a specified table or partition to the remote repository as a file stored in the Doris. After a user submits a backup request, the system performs the following operations:

  1. Snapshot and Snapshot Upload

    Backup is performed on snapshots. In the snapshot phase, snapshots are taken for specified tables or partition data files. A snapshot only generates a hard link for the current data file, which is time-consuming. After a snapshot is created, operations such as modification and import on the table do not affect the backup result. After the snapshot is complete, the system starts to upload the snapshot files one by one. The snapshot files are concurrently uploaded by each backend.

  2. Preparing and Uploading Metadata

    After the data file snapshot is uploaded, Frontend writes the corresponding metadata into a local file and uploads the local metadata file to the remote repository through Broker to complete the backup job.

  • If the table to be backed up is a dynamic partition table, the dynamic partition attribute is automatically disabled after the backup. Before restoring data, run the following command to manually enable the dynamic partition attribute of the table:

    ALTER TABLE tbl1 SET ("dynamic_partition.enable"="true")

  • The data backup operation does not retain the colocate_with attribute of the table.

Prerequisite

  • A cluster containing the Doris service has been created, and all services in the cluster are running properly.
  • The node to be connected to the Doris database can communicate with the MRS cluster.
  • The MySQL client has been installed. For details about related operations, see Installing a MySQL Client
  • Create a user with the Doris management permission.
    • Kerberos authentication is enabled for the cluster (the cluster is in security mode)

      On FusionInsight Manager, create a human-machine user, for example, dorisuser, create a role with the Doris administrator rights, and bind the role to the user.

      Log in to FusionInsight Manager as the new user dorisuser and change the initial password of the user.

    • Kerberos authentication is disabled for the cluster (the cluster is in normal mode)

      After connecting to Doris as user admin, create a role with administrator permissions, and bind the role to the user.

Backing Up Doris Data

  1. Log in to the node where MySQL is installed and run the following command to connect to the Doris database:

    If Kerberos authentication is enabled for the cluster (the cluster is in security mode), run the following command to connect to the Doris database:

    export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1

    mysql -uDatabase login username -pDatabase login password -PDatabase connection port -hIP address of the Doris FE instance

    • The database connection port is the query connection port of the Doris FE. You can log in to FusionInsight Manager, choose Cluster > Services > Doris > Configurations, and query the value of query_port of the Doris service.
    • To obtain the IP address of the Doris FE instance, log in to FusionInsight Manager of the MRS cluster and choose Cluster > Services > Doris > Instances to view the IP address of any FE instance.
    • You can also use the MySQL connection software or Doris WebUI to connect to the database.

  1. Run the following command to create a remote repository example_repo in HDFS:

    Kerberos authentication is enabled for the cluster (the cluster is in security mode)

    CREATE REPOSITORY `example_repo`

    WITH BROKER `hdfs_broker`

    ON LOCATION "hdfs://hadoop-name-node:25000/path/to/repo/"

    PROPERTIES

    (

    "hadoop.security.authentication"="kerberos",

    "kerberos_principal"="doris/hadoop.hadoop.com@HADOOP.COM",

    "kerberos_keytab"="/opt/huawei/Bigdata/FusionInsight_Doris_8.3.0/install/FusionInsight-Doris-1.2.3/doris-fe/bin/doris.keytab"

    );

    Kerberos authentication is disabled for the cluster (the cluster is in normal mode)

    CREATE REPOSITORY `example_repo`

    WITH BROKER `hdfs_broker`

    ON LOCATION "hdfs://hadoop-name-node:25000/path/to/repo/"

    PROPERTIES

    (

    "username" = "hdfs",

    "password" = ""

    );

  2. View the created repository.

    SHOW REPOSITORIES;

  1. Back up data to the example_repo. You can back up table data or partition data. For example:

    • To fully back up data in the example_tbl table in example_db to example_repo, run the following command:

      BACKUP SNAPSHOT example_db.snapshot_label1

      TO example_repo

      ON (example_tbl)

      PROPERTIES ("type" = "full");

    • Fully back up the p1 and p2 partitions of the example_tbl table in the example_db and the example_tbl2 table to the example_repo.

      BACKUP SNAPSHOT example_db.snapshot_label2

      TO example_repo

      ON

      (

      example_tbl PARTITION (p1,p2),

      example_tbl2

      );

  2. Run the following command to check the execution status of the backup job:

    show BACKUP;

  3. Check whether the backup is successful in the remote repository.

    SHOW SNAPSHOT ON example_repo WHERE SNAPSHOT = "snapshot_label1";

    +-----------------+---------------------+--------+
    | Snapshot        | Timestamp           | Status |
    +-----------------+---------------------+--------+
    | snapshot_label1 | 2022-04-08-15-52-29 | OK     |
    +-----------------+---------------------+--------+
    1 row in set (0.15 sec)