Help Center> Data Warehouse Service (DWS)> Management Guide> Monitoring and Alarms> Alarms> Alarm Handling> DWS_2000000016 Data Spilled to Disks for a Query Statement Exceeds the Threshold
Updated on 2024-06-14 GMT+08:00

DWS_2000000016 Data Spilled to Disks for a Query Statement Exceeds the Threshold

Description

During the execution of service queries, the database may choose to store the temporary result to the disk, which is called Operator Spilling.

GaussDB(DWS) checks the load management records of jobs being executed on CNs through the GS_WLM_SESSION_STATISTICS view every 60 seconds and calculates the maximum amount of data spilled to DNs.

If the number of SQL statements spilled to disks exceeds 5 GB (configurable) within 10 minutes (configurable), an alarm is reported indicating that a query statement triggers the data spill threshold. This alarm is automatically cleared when the data spill drop below the alarm conditions. For details about how to modify alarm configurations, see Modifying Alarm Rules.

If blocked SQL statements that can trigger the alarm persists, the alarm is generated again after 24 hours (configurable).

Attributes

Alarm ID

Alarm Severity

Auto Clear

DWS_2000000016

Critical

Yes

Parameters

Parameter

Description

Source

Indicates the name of the system for which the alarm is generated and the detailed alarm type.

Cluster Name

ID of the cluster for which the alarm is generated

Location Information

ID and name of the cluster for which the alarm is generated

Other Information

CloudService indicates the cloud service for which the alarm is generated, including the service name, resource ID, resource name, database name, username connecting to the backend, and query ID. first_alarm_time indicates the time when the alarm is generated for the first time. query statement indicates the query statement that triggers the alarm, along with the amount of data spilled to disks caused by the query statement. Example: CloudService=DWS,resourceId: xxxx-xxxx-xxxx-xxxx, resourceIdName: test_dws, db_name: test_db, user_name: test_user, query_id: 756942385413326696, first_alarm_time: 2022-12-30:12:42:77: query statement (ID= 756942385413326696) select num,value,cnt,rk,cnt/sumcnt as ratio,sum(ratio) over (over by rk) as cumuratio from...; The result set is spilled to disks, and the spill size is 15 GB.

Time

Indicates the time when the alarm was generated.

Status

Status of an alarm.

You can connect to the database and run the SELECT * FROM GS_WLM_SESSION_STATISTICS command to view the max_spill_size column in the view.

Impact on the System

If a large amount of data spills to disks, a large number of system I/O resources are occupied. As a result, the data disk space may be insufficient or exhausted, triggering the database to become read-only and interrupting services.

Possible Causes

  • The amount of service data spilled to disks exceeds the alarm threshold.
  • The performance of the SQL query plan is poor, causing a large amount of data to be imported to the memory and spilled to disks.
  • Expired data is not cleared in a timely manner. As a result, too much invalid data is scanned and spilled to disks.

Handling Procedure

  1. Check whether the execution plan is poor in performance.

    1. Obtain the SQL statement from the additional information of the alarm, run the ANALYZE statement on the involved tables. Run the SQL statement again, and check whether the amount of data spilled to disks decreases.
    2. If there is no obvious effect, run the EXPLAIN PERFORMANCE command to view the actual execution information of the alarm SQL statement. For details, see SQL Execution Plan. Based on the execution information, if both the estimated memory usage (operator memory) and peak memory are large, for example, greater than 20% of max_process_memory, you need to optimize the query. For details, see Optimization Process.

  2. Check whether the alarm configuration is proper.

    1. Return to the GaussDB(DWS) console and choose Alarms > Alarm Rule.
    2. Click Modify in the Operation column of the row that contains Data Flushed to Disks of the Query Statement Exceeds the Threshold. The Modifying an Alarm Rule page is displayed.
    3. If the cluster disk capacity is high, you can increase the alarm reporting threshold. It is recommended that the alarm reporting threshold be less than or equal to 5% of the capacity of a single data disk.

      If the threshold is too large, data spilled to a disk may cause disk usage alarms or even the cluster to be read-only. If the data disk usage is close to or exceeds 80%, you are advised to clear unnecessary data when adjusting the threshold. For details about the GUI configuration, see Alarm Rules.

  3. Kill the SQL statements that cause large data spills.

    1. Return to the GaussDB(DWS) management console.
    2. On the Clusters > Dedicated Clusters page, locate the row that contains the cluster for which the alarm is generated and click Monitoring Panel in the Operation column.
    3. Choose Monitoring > Queries. Click to see the data spill in the Max. DN Data Spill (MB) column.
    4. After confirming with the service side, select the query ID of the query to be stopped and click Stop Query.
    5. Adjust the database parameters for controlling the disk space of service statements. For details about the parameters, see Statement Disk Space Control. For details about the procedure, see Modifying Database Parameters

      For example, the default value of sql_use_spacelimit is 10% of the total storage space of the DB instance. If the storage space is sufficient, you can increase the value. If the disk write volume of a single DN exceeds the value, GaussDB(DWS) stops the query and displays a message indicating that the disk write volume of a single DN exceeds the threshold.

Alarm Clearance

This alarm is automatically cleared when data spill drops down to a low level.