DWS_2000000016 Data Spilled to Disks for a Query Statement Exceeds the Threshold

Description

During the execution of service queries, the database may choose to store the temporary result to the disk, which is called operator spilling.

GaussDB(DWS) checks the load management records of jobs being executed on CNs through the GS_WLM_SESSION_STATISTICS view every 60 seconds and calculates the maximum amount of data spilled to DNs.

If the number of SQL statements spilled to disks exceeds 5 GB (configurable) within 10 minutes (configurable), an alarm is reported indicating that a query statement triggers the data spill threshold. This alarm is automatically cleared when the data spill drop below the alarm conditions. For details about how to modify alarm configurations, see Modifying Alarm Rules.

If blocked SQL statements that can trigger the alarm persists, the alarm is generated again after 24 hours (configurable).

Attributes

Alarm ID	Alarm Category	Alarm Severity	Alarm Type	Service Type	Auto Cleared
DWS_2000000016	Management plane alarm	Urgent: > 5 GB	Operation alarm	GaussDB(DWS)	Yes

Parameters

Category	Name	Description
Location information	Name	Data Spilled to Disks of the Query Statement Exceeds the Threshold
	Type	Operation alarm
	Generation time	Time when the alarm is generated
Other information	Cluster ID	Cluster details such as resourceId and domain_id

You can connect to the database and run the SELECT * FROM GS_WLM_SESSION_STATISTICS command to view the max_spill_size column in the view.

Impact on the System

If a large amount of data spills to disks, a large number of system I/O resources are occupied. As a result, the data disk space may be insufficient or exhausted, triggering the database to become read-only and interrupting services.

Possible Causes

The amount of service data spilled to disks exceeds the alarm threshold.
The performance of the SQL query plan is poor, causing a large amount of data to be imported to the memory and spilled to disks.
Expired data is not cleared in a timely manner. As a result, too much invalid data is scanned and spilled to disks.

Handling Procedure

Check whether the execution plan is poor in performance.
1. Obtain the SQL statement from the additional information of the alarm, run the ANALYZE statement on the involved tables. Run the SQL statement again and check whether the amount of data spilled to disks decreases.
2. If there is no obvious effect, run the EXPLAIN PERFORMANCE command to view the actual execution information of the alarm SQL statement. For details, see SQL Execution Plan. If the estimated (operator memory) and peak memory usage (Peak Memory) are high, exceeding 20% of max_process_memory, optimization of the query is necessary based on execution information. For details, see Optimization Process.
Check whether the alarm configuration is proper.
1. Log in to the GaussDB(DWS) console, choose Management > Alarms and click View Alarm Rule.
2. Click Modify in the Operation column of the row that contains Data Flushed to Disks of the Query Statement Exceeds the Threshold. The Modifying an Alarm Rule page is displayed.
3. If the cluster disk capacity is high, you can increase the alarm reporting threshold. It is recommended that the alarm reporting threshold be less than or equal to 5% of the capacity of a single data disk.
  
  If the threshold is too large, data spilled to a disk may cause disk usage alarms or even the cluster to be read-only. If the data disk usage is close to or exceeds 80%, you are advised to clear unnecessary data when adjusting the threshold. For details about the GUI configuration, see Alarm Rules.
Kill the SQL statements that cause large data spills.
1. Return to the GaussDB(DWS) console.
2. On the Clusters > Dedicated Clusters page, locate the row that contains the cluster for which the alarm is generated and click Monitoring Panel in the Operation column.
3. Choose Monitoring > Queries. Click to see the data spill in the Max. DN Data Spill (MB) column.
4. After confirming with the service side, select the query ID of the query to be stopped and click Stop Query.
5. Adjust the database parameters for controlling the disk space of service statements. For details about the parameters, see Statement Disk Space Control. For details about the procedure, see Modifying Database Parameters.
  For example, the default value of sql_use_spacelimit is 10% of the total storage space of the DB instance. If the storage space is sufficient, you can increase the value. If the disk write volume of a single DN exceeds the value, GaussDB(DWS) stops the query and displays a message indicating that the disk write volume of a single DN exceeds the threshold.