Updated on 2024-12-13 GMT+08:00

Storing Cold and Hot Data Separately in Doris

This topic describes how to configure cold and hot data separation in Doris.

Prerequisite

The Doris cluster can communicate with OBS. For details, see Initial Configuration.

Creating an OBS Parallel File System and Obtaining the AK/SK and Domain ID

Create an OBS parallel file system.

  1. Log in to the OBS console.
  2. Choose Parallel File Systems > Create Parallel File System.
  3. Enter a file system name, for example, doris-obs.

    The name of an enterprise project must be the same as that of the MRS cluster. Set other parameters.

  4. Click Create Now.
  5. In the parallel file system list, click the name of the one you just created and click Overview to obtain the endpoint information.

    After a service is deleted or a cluster is uninstalled, dirty data may remain in the parallel file system created in 2 to 4. You need to delete the dirty data.

Obtain AK/SK information.

  1. Move the cursor on the username in the upper right corner, and select My Credentials from the drop-down list.
  2. On the API Credentials page, obtain the Account ID whish is used as the domain ID.
  3. Click Access Keys, click Create Access Key, and enter the verification code or password. Click OK and download the access keys. Obtain the AK/SK information from the .csv file.

Creating a Cloud Service Agency and Binding It to a Cluster

  1. Log in to the MRS management console.
  2. In the service list, choose Management & Governance > Identity and Access Management.
  3. In the navigation pane on the left, choose Agencies and click Create Agency. On the displayed page, set the following parameters and click Next:

    • Agency Name: Enter an agency name, for example, mrs_ecs_obs.
    • Agency Type: Select Cloud service.
    • Cloud Service: Select Elastic Cloud Server (ECS) and Bare Metal Server (BMS).
    • Validity Period: Select Unlimited.

  4. On the displayed page, search for the OBS OperateAccess policy and select it.
  5. Click Next, click Show More, select Global services, and click OK.
  6. In the displayed dialog box, click OK to start authorization. Click Finish after the message "Authorization successful." is displayed.
  7. On the MRS console, choose Active Clusters in the navigation pane on the left and click the name of the target cluster to access its details page.
  8. On the Dashboard page, click Synchronize on the right of IAM User Sync to synchronize the IAM user.
  9. Click Manage Agency on the right of Agency, select the created agency, for example, mrs_ecs_obs, and click OK to bind the agency to the cluster.

Creating a Common Account Agency and Binding It to a Cluster

  1. Log in to the management console.
  2. In the service list, choose Management & Governance > Identity and Access Management.
  3. Choose Permissions > Policies/Roles, and click Create Custom Policy. Set the following parameters, and click OK:

    • Policy Name: Enter a policy name, for example, doris-policy.
    • Policy View: Select JSON.
    • Policy Content: Enter the following content.
      {
          "Version": "1.1",
          "Statement": [
              {
                  "Effect":"Allow",
                  "Action":[
                      "OBS:*:*"
                      ]
              }
              ]
      }

  4. In the navigation pane on the left, click Agencies. Click Create Agency. On the displayed page, set the following parameters and click Next:

    • Agency Name: Enter an agency name, for example, agency-MRS-to-OBS.
    • Agency Type: Select Account.
    • Enter your cloud account in the Delegated Account field, that is, the account you register using your mobile phone number. It cannot be a federated user or an IAM user created using your cloud account.
    • Validity Period: Select Unlimited.

  5. In the search box on the displayed Authorize Agency page, search for the custom policy created in 3 and select it, for example, doris-policy.
  6. Click Next, click Show More, select Global services, and click OK.
  7. Check and record the agency ID.
  8. On the Identity and Access Management page, click Agencies.
  9. Click the name of the cloud service agency created in 3, for example, mrs_ecs_obs.
  10. Click the Permissions tab and click Authorize. On the displayed page, click Create Policy in the upper right corner, and set the parameters as follows:

    • Policy Name: Enter a policy name, for example, doris-assume-policy.
    • Policy View: Select JSON.
    • Policy Content: Enter the following content. {Agency ID} indicates the ID recorded in 7.
      {
          "Version": "1.1",
          "Statement": [
              {
                  "Action": [
                      "iam:agencies:assume"
                  ],
                  "Resource": {
                      "uri": [
                          "/iam/agencies/{Agency ID}"
                      ]
                  },
                  "Effect": "Allow"
              }
          ]
      }

  11. Click Next. On the Select Policy/Role page, select the policy created in 10.
  12. Click Next, click Show More, select Global services, and click OK.

Enabling Cold and Hot Data Separation

By default, cold and hot data separation is disabled. To use this function, perform the following operations:

  1. Log in to FusionInsight Manager and choose Cluster > Services > Doris. Click the Configurations tab.
  2. Search for the obs_cooldown_enable parameter and set it to true.
  3. (Optional) If the data on the local disk is cooled down and stored on OBS, and related data needs to be stored on the local disk in a certain period of time, select All Configurations > BE(Role) > Customization, add the moveback_enable parameter to the customized parameter be.conf.customized.configs and set the parameter value to true.
  4. Click Save and then OK.
  5. Click Instances, select the affected FE and BE instances, choose More > Restart Instance, and enter the password of the current user to restart the FE and BE instances.

Use Case

  1. Log in to the node where MySQL is installed and run the following command to connect to the Doris database:

    If Kerberos authentication is enabled for the cluster (the cluster is in security mode), run the following command to connect to the Doris database:

    export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1

    mysql -uDatabase login user -p -PConnection port for FE queries -hIP address of the Doris FE instance

    Enter the password for logging in to the database.

    • To obtain the query connection port of the Doris FE instance, you can log in to FusionInsight Manager, choose Cluster > Services > Doris > Configurations, and query the value of query_port of the Doris service.
    • To obtain the IP address of the Doris FE instance, log in to FusionInsight Manager of the MRS cluster and choose Cluster > Services > Doris > Instances to view the service IP address of any FE instance.
    • You can also use the MySQL connection software or Doris web UI to connect to the database.

  2. Create a resource.

    • Create a resource that supports an agency for OBS authentication.
      1. Log in to FusionInsight Manager and choose Cluster > Services > Doris. Click Configurations then All Configurations, choose FE(Role) > OBS, and modify the following parameters and save the changes:
        • obs_authentication_method: Change the value to agency.
        • obs_endpoint: endpoint information queried in 5, for example, obs.XXX.
        • object_storage_security_provider: Retain the default value com.huawei.mrs.MrsObsCredentialsProvider.
      2. On the Dashboard page, click More > Restart Service. Enter the password of the current user and click OK to restart the Doris service.
      3. Connect the node where the MySQL client is installed to Doris. For details, see Using the MySQL Client to Connect to Doris.
      4. Run the following statement to create a resource:

        CREATE RESOURCE IF NOT EXISTS resource_obs_hot_cold PROPERTIES (

        "type" = "obs",

        "obs.bucket" = "Name of the OBS parallel file system created in 3",

        "obs.root.path" = "Root directory for storing data"

        "obs.region" = " Region of the cluster where the Doris service is deployed"

        );

    • Create a resource that supports permanent OBS authentication.
      1. Log in to FusionInsight Manager and choose Cluster > Services > Doris. Click Configurations then All Configurations, choose FE(Role) > OBS, and modify the following parameters and save the changes:
        • obs_authentication_method: Change the value to permanent.
        • object_storage_security_provider: Retain the default value com.huawei.mrs.MrsObsCredentialsProvider.
      2. On the Dashboard page, click More > Restart Service. Enter the password of the current user and click OK to restart the Doris service.
      3. Connect the node where the MySQL client is installed to Doris. For details, see Using the MySQL Client to Connect to Doris.
      4. Run the following statement to create a resource:

        CREATE RESOURCE IF NOT EXISTS resource_obs_hot_cold PROPERTIES (

        "type" = "obs",

        "obs.endpoint" = "xxx",

        "obs.region" = "xxx",

        "obs.bucket" = "xxx",

        "obs.root.path" = "xxx",

        "obs.access_key" = "xxx",

        "obs.secret_key" = "xxx"

        );

      • type: data storage type. The value is obs.
      • obs.endpoint: endpoint information viewed in 5
      • obs.region: region of the cluster where the Doris service is deployed
      • obs.bucket: name of the OBS parallel file system created in 3
      • obs.root.path: root directory for storing data
      • obs.access_key: AK information obtained in 8
      • obs.secret_key: SK information obtained in 8

  3. Set the data cooling policy using the storage policy.

    • Set the time to live (TTL) to cool down data.

      CREATE STORAGE POLICY IF NOT EXISTS policy_doris_hot_cold PROPERTIES("storage_resource" = "resource_obs_hot_cold", "cooldown_ttl" = "1d");

      If the value of cooldown_ttl is 1d, newly imported data will be cooled one day later and the cooled data will be stored in the OBS path configured during resource creation in 2.

    • Set a time point to cool down data.

      In addition to setting the TTL, you can also set a time point in the cooling policy.

      CREATE STORAGE POLICY IF NOT EXISTS policy_doris_hot_cold2 PROPERTIES("storage_resource" = "resource_obs_hot_cold", "cooldown_datetime" = "2024-01-01 10:00:00");

  4. Set the storage policy of a table or partition.

    • Set the storage policy when creating a table.

      CREATE TABLE ORDERS (

      ORDER_ID VARCHAR(50),

      USER_ID BIGINT,

      PRODUCT_ID VARCHAR(10),

      PRICE DECIMAL(15,2),

      CHANNEL VARCHAR(20),

      CREATE_DT DATE

      )

      DUPLICATE KEY(`ORDER_ID`)

      PARTITION BY RANGE(`CREATE_DT`)

      (

      PARTITION `p202401` VALUES LESS THAN ("2024-02-01"),

      PARTITION `p202402` VALUES LESS THAN ("2024-03-01")

      )

      DISTRIBUTED BY HASH(`ORDER_ID`) BUCKETS 3

      PROPERTIES (

      "replication_num" = "3",

      "storage_policy" = "policy_doris_hot_cold "

      );

    • Modify the properties of an existing table.

      ALTER TABLE ORDERS SET("storage_policy" = "policy_doris_hot_cold");

    • Set the cold and hot separation policy for a partition when creating a table.

      CREATE TABLE ORDERS (

      ORDER_ID VARCHAR(50),

      USER_ID BIGINT,

      PRODUCT_ID VARCHAR(10),

      PRICE DECIMAL(15,2),

      CHANNEL VARCHAR(20),

      CREATE_DT DATE

      )

      DUPLICATE KEY(`ORDER_ID`)

      PARTITION BY RANGE(`CREATE_DT`)

      (

      PARTITION `p202401` VALUES LESS THAN ("2024-02-01") ("storage_policy" = "policy_doris_hot_cold"),

      PARTITION `p202402` VALUES LESS THAN ("2024-03-01")

      )

      DISTRIBUTED BY HASH(`ORDER_ID`) BUCKETS 3

      PROPERTIES (

      "replication_num" = "3"

      );

    • Modify the partition properties of an existing table.

      ALTER TABLE ORDERS MODIFY PARTITION (`p202401`) SET("storage_policy"="policy_doris_hot_cold");

    • A single table or partition can be associated with only one storage policy. Associated storage policies cannot be deleted before disassociation.
    • Information about the object associated with a storage policy cannot be modified, such as bucket, endpoint, and root_path.
    • A storage policy can be created, modified, and deleted. Before deleting a storage policy, ensure that no table references the storage policy.
    • When the Merge-on-Write feature is enabled for the Unique model, storage policies cannot be set.

  5. Run the following statement to query data:

    show tablets from ORDERS;

    This command views the tablet information of the table. In the tablet information, LocalDataSize and RemoteDataSize are distinguished. LocalDataSize indicates the data stored locally, and RemoteDataSize indicates the data that has been cooled and stored on OBS.

    Before the data is cooled, the tablet information of the table is as follows.

    After the data is cooled, the tablet information of the table is as follows.