Updated on 2024-06-07 GMT+08:00

Best Practices of Resource Management

This practice demonstrates how to use GaussDB(DWS) for resource management, helping enterprises eliminate bottlenecks in concurrent query performance. SQL jobs can run smoothly without affecting each other and consume less resources than before.

This practice takes about 60 minutes. The process is as follows:

  1. Step 1: Creating a Cluster
  2. Step 2: Connecting to a Cluster and Importing Data
  3. Step 3: Creating a Resource Pool
  4. Step 4: Verifying Exception Rules

Scenarios

When multiple database users execute SQL jobs on GaussDB(DWS) at the same time, the following situations may occur:

  1. Some complex SQL statements occupy cluster resources for a long time, affecting the performance of other queries. For example, a group of database users continuously submit complex and time-consuming queries, and another group of users frequently submit short queries. In this case, short queries may have to wait in the resource pool for the time-consuming queries to complete.
  2. Some SQL statements occupy too much memory or disk space due to data skew or unoptimized execution plans. As a result, the statements that fail to apply for memory report errors, or the cluster switches to the read-only mode.

To increase the system throughput and improve SQL performance, you can use workload management of GaussDB(DWS). For example, create a resource pool for users who frequently submit complex query jobs, and allocate more resources to this resource pool. The complex jobs submitted by these users can use only the resources of this resource pool. Create another resource pool that occupies less resources and add users who submit short queries to this resource pool. In this way, the two types of jobs can be smoothly executed at the same time.

For example, user A processes online transaction processing (OLTP) and online analytical processing (OLAP) services. The priority of the OLAP service is lower than that of OLTP service. A large number of concurrent complex SQL queries may cause server resource contention, whereas a large number of concurrent simple SQL queries can be quickly processed without being queued. Resources must be properly allocated and managed to ensure both OLAP and OLTP services can run smoothly.

OLAP services are often complex, and do not require high priority or real-time response. OLAP and OLTP services are operated by different users. For example, the database user budget_config_user is used for core transaction services, and the database user report_user is used for report services. The users are under independent CPU and concurrency management to improve database stability.

Based on the workload survey, routine monitoring, and test and verification of OLAP services, it is found that less than 50 concurrent SQL queries do not cause server resource contention or slow service system response. OLAP users can use 20% CPU resources.

Based on the workload survey, routine monitoring, and test and verification of OLTP services, it is found that less than 100 concurrent SQL queries do not pose continuous pressure onto the system. OLTP users can use 60% of CPU resources.

  • Resource configuration for OLAP users (corresponding to pool_1): CPU = 20%, memory = 20%, storage = 1,024,000 MB, concurrency = 20.
  • Resource configuration for OLTP users (corresponding to pool_2): CPU = 60%, memory = 60%, storage = 1,024,000 MB, concurrency = 200.

Set the maximum memory that can be used by a single statement. An error will be reported if the memory usage exceeds the value.

In Exception Rule, set Blocking Time to 1200s and Execution Time to 1800s. A query job will be terminated after being executed for more than 1800 seconds.

Step 1: Creating a Cluster

Create a cluster by referring to Creating a cluster.

Step 2: Connecting to a Cluster and Importing Data

  1. Connect to a cluster by referring to Using the gsql CLI Client to Connect to a Cluster.
  2. Import sample data. For details, seeImporting TPC-H Data.
  3. Run the following statements to create the OLTP user budget_config_user and OLAP user report_user.

    1
    2
    CREATE USER budget_config_user PASSWORD 'password';
    CREATE USER report_user PASSWORD 'password';
    

  4. For test purposes, grant all permissions on all tables in schema tpch to both users.

    1
    GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA tpch to budget_config_user,report_user;
    

  5. Check the resource allocation of the two users.

    1
    SELECT * FROM PG_TOTAL_USER_RESOURCE_INFO where username in ('budget_config_user' , 'report_user');
    

Step 3: Creating a Resource Pool

  1. Log in to the GaussDB(DWS) management console, click a cluster name in the cluster list. The Resource Management Configurations page is displayed.
  2. Click Add Workload Queue. Create the report resource pool pool_1 and transaction resource pool pool_2 by referring to Scenarios.

  3. Modify the exception rules.

    1. Click the created pool_1.
    2. In the Exception Rule area, set Blocking Time to 1200s and Execution Time to 1800s.
    3. Click Save.
    4. Repeat the preceding steps to configure pool_2.

  4. Associate users.

    1. Click pool_1 on the left.
    2. Click Add on the right of User Association.
    3. Select report_user and click OK.
    4. Repeat the preceding steps to add budget_config_user to pool_2.

Step 4: Verifying Exception Rules

  1. Log in to the database as user report_user.
  2. Run the following command to check the resource pool to which the report_user user belongs:

    1
    SELECT usename,respool FROM pg_user WHERE usename = 'report_user';
    

    The query result shows that the resource pool to which the report_user user belongs is pool_1.

  3. Verify the exception rule bound to the resource pool pool_1.

    1
    SELECT respool_name,mem_percent,active_statements,except_rule FROM pg_resource_pool WHERE respool_name='pool_1';
    

    It is confirmed that the exception rule rule_1 is bound to pool_1.

  4. View the rule type and threshold of the exception rule for the current user.

    1
    SELECT * FROM pg_except_rule WHERE name = 'rule_1';
    

    The return shows that rule_1 has 1200 seconds of block time and 1800 seconds of running duration.

    • PG_EXCEPT_RULE records information about exception rules and is supported only in cluster 8.2.0 or later.
    • The relationship between parameters in the same exception rule is AND.

  5. When the block time of a job exceeds 1200s and the running duration exceeds 1800s, an error message is displayed, indicating that the exception rule is triggered and the job is canceled.

    If error information similar to "ERROR: canceling statement due to workload manager exception." is displayed during job execution, the job is terminated because it exceeds the threshold of the exception rule. If the rules do not need to be modified, you need to optimize the service statements to reduce the execution time.