Updated on 2023-10-23 GMT+08:00

Data Damage Detection and Repair Functions

  • gs_verify_data_file(verify_segment bool)

    Description: Checks whether files in the current database of the current instance are lost. Only whether intermediate segments are lost in the main file of the data table is checked. The default value is false, indicating that segment-page table data files are not checked. If this parameter is set to true, only segment-page table files are checked. By default, only initial users, users with the sysadmin permission, and users with the O&M administrator permission in the O&M mode can view the information. Other users can view the information only after being granted with permissions.

    The returned result is as follows:

    • Non-segment-page table: rel_oid and rel_name indicate the table OID and table name of the corresponding file, and miss_file_path indicates the relative path of the lost file.
    • Segment-page table: All tables are stored in the same file. Therefore, rel_oid and rel_name cannot display information about a specific table. For a segment-page table, if the first file is damaged, the subsequent files such as .1 and .2 are not checked. For example, if files 3, 3.1, and 3.2 are damaged, only damage of file 3 can be detected. When the number of segment-page files is less than 5, the files that are not generated are also checked during function detection. For example, if there are only files 1 and 2, files 3, 4, and 5 are checked during segment-page file detection. In the following examples, the first is an example of checking a non-segment-page table, and the second is an example of checking a segment-page table.

    Parameter description:

    • verify_segment

      Specifies the range of files to be checked. false indicates that non-segment-page tables are checked. true indicates that segment-page tables are checked.

      The value can be true or false (default value).

    Return type: record

    Example:

    Check a non-segment-page table.

    openGauss=# select * from gs_verify_data_file();
    node_name         | rel_oid |  rel_name    |  miss_file_path
    ------------------+---------+--------------+------------------
    dn_6001_6002_6003 |   16554 |     test     | base/16552/24745

    Check a segment-page table.

    openGauss=# select * from gs_verify_data_file(true);
         node_name     | rel_oid | rel_name | miss_file_path
    -------------------+---------+----------+----------------
     dn_6001_6002_6003 |       0 | none     | base/16573/2

  • gs_repair_file(tableoid Oid, path text, timeout int)

    Description: Repairs the file based on the input parameters. This function can be used only by the primary DN that is properly connected to the standby DN. The parameter is set based on the OID and path returned by the gs_verify_data_file function. The table OID for a segment-page table ranges from 0 to 4294967295. (The internal verification determines whether a file is a segment-page table file based on the file path. The table OID is not used for a segment-page table file.) If the repair is successful, true is returned. If the repair fails, the failure cause is displayed. By default, only initial users, users with the sysadmin permission, and users with the O&M administrator permission in the O&M mode on the primary DN can view the information. Other users can view the information only after being granted with permissions.

    1. If a file on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal.
    2. If a file exists but its size is 0, the file will not be repaired. To repair the file, you need to delete the file whose size is 0 and then repair it.
    3. You can delete a file only after the file descriptor is automatically closed. You can manually restart the process or perform a primary/standby switchover.

    Parameter description:

    • tableoid

      Specifies the OID of the table corresponding to the file to be repaired. Set this parameter based on the rel_oid column in the list returned by the gs_verify_data_file function.

      Value range: OID ranging from 0 to 4294967295 Note: A negative value will be forcibly converted to a non-negative integer.

    • path

      Specifies the path of the file to be repaired. Set this parameter based on the miss_file_path column in the list returned by the gs_verify_data_file function.

      Value range: a string

    • timeout

      Specifies the duration for waiting for standby DN playback. The file to be repaired needs to wait for the standby DN to be played back to the corresponding location on the current primary DN. Set this parameter based on the playback duration of the standby DN.

      Value range: 60s to 3600s

    Return type: Boolean

    Example:

    openGauss=# select * from gs_repair_file(16554,'base/16552/24745',360);
    gs_repair_file
    ----------------
    t

  • local_bad_block_info()

    Description: Displays the page damage of the instance. You can read the page from the disk and record the page CRC failure. By default, only initial users, users with the sysadmin permission, users with the monitor administrator permission, users with the O&M administrator permission in the O&M mode, and monitor users can view the information. Other users can view the information only after being granted with permissions. file_path indicates the relative path of the damaged file. If the table is a segment-page table, the logical information instead of the actual physical file information is displayed. block_num indicates the number of the page where the file is damaged. The page number starts from 0. check_time indicates the time when the page damage is detected. repair_time indicates the time when the page is repaired.

    Return type: record

    Example:

    openGauss=# select * from local_bad_block_info();
    node_name    | spc_node | db_node | rel_node| bucket_node | fork_num | block_num |    file_path     |  check_time            |   repair_time
    -----------------+-------+--------+--------+--------------+----------+-----------+-----------------+--------------------------+-------------------------------
    dn_6001_6002_6003|  1663 |  16552 |  24745 |        -1    |    0    | 0        | base/16552/24745 | 2022-01-13 20:19:08.385004+08 | 2022-01-13 20:19:08.407314+08
    
  • remote_bad_block_info()

    Description: Queries the page damage of other instances except the current instance when a query is performed on the CN. The recorded data is the same as that of the local_bad_block_info function executed on other instances. The execution result on the DN is empty. By default, only initial users, users with the sysadmin permission, users with the monitor administrator permission, users with the O&M administrator permission in the O&M mode, and monitor users can view the information. Other users can view the information only after being granted with permissions.

    Return type: record

  • local_clear_bad_block_info()

    Description: Deletes data of repaired pages from local_bad_block_info, that is, information whose repair_time is not empty. By default, only initial users, users with the sysadmin permission, users with the O&M administrator permission in the O&M mode, and monitor users can view the information. Other users can view the information only after being granted with permissions.

    Return type: Boolean

    Example:

    openGauss=# select * from local_clear_bad_block_info();
    result
    --------
    t
  • remote_clear_bad_block_info()

    Description: Clears the data of the repaired pages of other instances except the current instance when this function is executed on the CN, that is, information whose repair_time is not empty. The execution result on the DN is empty. By default, only initial users, users with the sysadmin permission, users with the O&M administrator permission in the O&M mode, and monitor users can view the information. Other users can view the information only after being granted with permissions.

    Return type: record

  • gs_verify_and_tryrepair_page (path text, blocknum Oid, verify_mem bool, is_segment bool)

    Description: Verifies the page specified by the instance. By default, only initial users, users with the sysadmin permission, and users with the O&M administrator permission in the O&M mode on the primary DN can view the information. Other users can view the information only after being granted with permissions. In the command output, disk_page_res indicates the verification result of the page on the disk, mem_page_res indicates the verification result of the page in the memory, and is_repair indicates whether the repair function is triggered during the verification. t indicates that the page is repaired, and f indicates that the page is not repaired.

    Note: If a page on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal. Damaged pages of hash bucket tables cannot be repaired.

    Parameter description:

    • path

      Specifies the path of the damaged file. Set this parameter based on the file_path column in local_bad_block_info.

      Value range: a string

    • blocknum

      Specifies the page number of the damaged file. Set this parameter based on the block_num column in local_bad_block_info.

      Value range: OID ranging from 0 to 4294967295. Note: A negative value will be forcibly converted to a non-negative integer.

    • verify_mem

      Specifies whether to verify a specified page in the memory. If this parameter is set to false, only pages on the disk are verified. If this parameter is set to true, pages in the memory and those on the disk are verified. If a page on the disk is damaged, the system verifies the basic information of the page in the memory and flushes the page to the disk to restore the page. If a page is not found in the memory during memory page verification, the page on the disk is read through the memory API. During this process, if the disk page is faulty, the automatic repair function through remote read is triggered.

      Value range: The value is of a Boolean type and can be true or false.

    • is_segment

      Specifies whether the table is a segment-page table. Set this parameter based on the value of bucket_node in local_bad_block_info. If the value of bucket_node is –1, the table is not a segment-page table. In this case, set is_segment to false. If the value of bucket_node is not –1, set is_segment to true.

      Value range: The value is of Boolean type and can be true or false.

    Return type: record

    Example:

    openGauss=# select * from gs_verify_and_tryrepair_page('base/16552/24745',0,false,false);
    node_name         |       path      |  blocknum  |        disk_page_res        | mem_page_res | is_repair
    ------------------+------------------+------------+-----------------------------+---------------+----------
    dn_6001_6002_6003 | base/16552/24745 |     0      | page verification succeeded.|              | f

  • gs_repair_page(path text, blocknum Oid is_segment bool, timeout int)

    Description: Restores the specified page of the instance. This function can be used only by the primary DN that is properly connected to the standby DN. By default, only initial users, users with the sysadmin permission, and users with the O&M administrator permission in the O&M mode on the primary DN can view the information. Other users can view the information only after being granted with permissions. If the page is successfully restored, true is returned. If an error occurs during the restoration, an error message is displayed.

    Note: If a page on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal. Damaged pages of hash bucket tables cannot be repaired.

    Parameter description:

    • path

      Specifies the path of the damaged page. Set this parameter based on the file_path column in local_bad_block_info or the path column in gs_verify_and_tryrepair_page.

      Value range: a string

    • blocknum

      Specifies the number of the damaged page. Set this parameter based on the block_num column in local_bad_block_info or the blocknum column in gs_verify_and_tryrepair_page.

      Value range: OID ranging from 0 to 4294967295. Note: A negative value will be forcibly converted to a non-negative integer.

    • is_segment

      Specifies whether the table is a segment-page table. The value of this parameter is determined by the value of bucket_node in local_bad_block_info. If the value of bucket_node is –1, the table is not a segment-page table and is_segment is set to false. If the value of bucket_node is not –1, is_segment is set to true.

      Value range: The value is of Boolean type and can be true or false.

    • timeout

      Specifies the duration of waiting for standby DN playback. The page to be repaired needs to wait for the standby DN to be played back to the location of the current primary DN. Set this parameter based on the playback duration of the standby DN.

      Value range: 60s to 3600s

    Return type: Boolean

    Example:

    openGauss=# select * from gs_repair_page('base/16552/24745',0,false,60);
    result
    --------
    t