Help Center/ GaussDB(DWS)/ Performance Tuning/ Optimization Cases/ Case: Adding NOT NULL for JOIN Columns
Updated on 2023-06-25 GMT+08:00

Case: Adding NOT NULL for JOIN Columns

If there are many NULL values in the JOIN columns, you can add the filter criterion IS NOT NULL to filter data in advance to improve the JOIN efficiency.

Before optimization

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SELECT
 * 
FROM
( ( SELECT
  STARTTIME STTIME,
  SUM(NVL(PAGE_DELAY_MSEL,0)) PAGE_DELAY_MSEL,
  SUM(NVL(PAGE_SUCCEED_TIMES,0)) PAGE_SUCCEED_TIMES,
  SUM(NVL(FST_PAGE_REQ_NUM,0)) FST_PAGE_REQ_NUM,
  SUM(NVL(PAGE_AVG_SIZE,0)) PAGE_AVG_SIZE,
  SUM(NVL(FST_PAGE_ACK_NUM,0)) FST_PAGE_ACK_NUM,
  SUM(NVL(DATATRANS_DW_DURATION,0)) DATATRANS_DW_DURATION,
  SUM(NVL(PAGE_SR_DELAY_MSEL,0)) PAGE_SR_DELAY_MSEL 
 FROM
  PS.SDR_WEB_BSCRNC_1DAY SDR
  INNER JOIN (SELECT
      BSCRNC_ID,
      BSCRNC_NAME,
      ACCESS_TYPE,
      ACCESS_TYPE_ID 
     FROM
      nethouse.DIM_LOC_BSCRNC 
     GROUP BY
      BSCRNC_ID,
      BSCRNC_NAME,
      ACCESS_TYPE,
      ACCESS_TYPE_ID) DIM 
  ON SDR.BSCRNC_ID = DIM.BSCRNC_ID 
  AND DIM.ACCESS_TYPE_ID IN (0,1,2) 
  INNER JOIN nethouse.DIM_RAT_MAPPING RAT 
  ON (RAT.RAT = SDR.RAT)
 WHERE
  ( (STARTTIME >= 1461340800 
  AND STARTTIME < 1461427200) ) 
  AND RAT.ACCESS_TYPE_ID IN (0,1,2) 
 GROUP BY STTIME ) ) ;

Figure 1 shows the execution plan.

Figure 1 Adding NOT NULL for JOIN columns (1)

After optimization

  1. As shown in Figure 1, the sequential scan phase is time consuming.
  2. The JOIN performance is poor because a large number of null values exist in the JOIN column BSCRNC_ID of the PS.SDR_WEB_BSCRNC_1DAY table.

    Therefore, you are advised to manually add NOT NULL for JOIN columns in the statement, as shown below:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    SELECT
     * 
    FROM
    ( ( SELECT
      STARTTIME STTIME,
      SUM(NVL(PAGE_DELAY_MSEL,0)) PAGE_DELAY_MSEL,
      SUM(NVL(PAGE_SUCCEED_TIMES,0)) PAGE_SUCCEED_TIMES,
      SUM(NVL(FST_PAGE_REQ_NUM,0)) FST_PAGE_REQ_NUM,
      SUM(NVL(PAGE_AVG_SIZE,0)) PAGE_AVG_SIZE,
      SUM(NVL(FST_PAGE_ACK_NUM,0)) FST_PAGE_ACK_NUM,
      SUM(NVL(DATATRANS_DW_DURATION,0)) DATATRANS_DW_DURATION,
      SUM(NVL(PAGE_SR_DELAY_MSEL,0)) PAGE_SR_DELAY_MSEL 
     FROM
      PS.SDR_WEB_BSCRNC_1DAY SDR
      INNER JOIN (SELECT
          BSCRNC_ID,
          BSCRNC_NAME,
          ACCESS_TYPE,
          ACCESS_TYPE_ID 
         FROM
          nethouse.DIM_LOC_BSCRNC 
         GROUP BY
          BSCRNC_ID,
          BSCRNC_NAME,
          ACCESS_TYPE,
          ACCESS_TYPE_ID) DIM 
      ON SDR.BSCRNC_ID = DIM.BSCRNC_ID 
      AND DIM.ACCESS_TYPE_ID IN (0,1,2) 
      INNER JOIN nethouse.DIM_RAT_MAPPING RAT 
      ON (RAT.RAT = SDR.RAT)
     WHERE
      ( (STARTTIME >= 1461340800 
      AND STARTTIME < 1461427200) ) 
      AND RAT.ACCESS_TYPE_ID IN (0,1,2) 
      and SDR.BSCRNC_ID is not null
     GROUP BY
      STTIME ) ) A;
    

    Figure 2 shows the execution plan.

    Figure 2 Adding NOT NULL for JOIN columns (2)