Help Center/ GaussDB/ Developer Guide(Distributed_2.x)/ Best Practices/ Best Practices for Data Skew Query/ Detecting Storage Skew in Real Time During Data Import
Updated on 2024-10-14 GMT+08:00

Detecting Storage Skew in Real Time During Data Import

During the import, the system collects statistics on the number of rows imported on each DN. After the import is complete, the system calculates the skew ratio. If the skew ratio exceeds the specified threshold, an alarm is generated immediately. The skew ratio is calculated as follows: Skew ratio = (Maximum number of rows imported on a DN – Minimum number of rows imported on a DN)/Number of imported rows. Currently, data can be imported only by running INSERT or COPY.

enable_stream_operator must be set to on so that DNs can return the number of imported rows at a time when a plan is delivered to them. Then, the skew ratio is calculated on CNs based on the returned values.

Procedure

  1. Set parameters table_skewness_warning_threshold (threshold for triggering a table skew alarm) and table_skewness_warning_rows (minimum number of rows for triggering a table skew alarm).
    • The value of table_skewness_warning_threshold ranges from 0 to 1. The default value is 1, indicating that the alarm is disabled. Other values indicate that the alarm is enabled.
    • The value of table_skewness_warning_rows ranges from 0 to 2147483647. The default value is 100,000. The alarm is triggered only when the following condition is met: Total number of imported rows > Value of table_skewness_warning_rows x Number of DNs involving in the import.
    1
    2
    3
    4
    show table_skewness_warning_threshold;
    set table_skewness_warning_threshold = xxx;
    show table_skewness_warning_rows;
    set table_skewness_warning_rows = xxx;
    
  2. Import data by running the INSERT or COPY statement.
  3. Detect and handle alarms. The alarm information includes the table name, minimum number of rows, maximum number of rows, total number of rows, average number of rows, skew rate, and prompt information about data distribution or parameter modification.
    WARNING: Skewness occurs, table name: xxx, min value: xxx, max value: xxx, sum value: xxx, avg value: xxx, skew ratio: xxx
    HINT: Please check data distribution or modify warning threshold