Data Damage Detection and Repair Functions
- gs_verify_data_file(verify_segment bool)
Description: Checks whether files in the current database of the current instance are lost. Only whether intermediate segments are lost in the main file of the data table is checked. The default value is false, indicating that segment-page table data files are not checked. If this parameter is set to true, only segment-page table files are checked. By default, only the initial user, users with the SYSADMIN permission, and users with the OPRADMIN permission in the O&M mode can view the information. Other users can view the information only after being granted with permissions.
The returned result is as follows:
- Non-segment-page table: rel_oid and rel_name indicate the table OID and table name of the corresponding file, and miss_file_path indicates the relative path of the lost file.
- Segment-page table: All tables are stored in the same file. Therefore, rel_oid and rel_name cannot display information about a specific table. For a segment-page table, if the first file is damaged, the subsequent files such as .1 and .2 are not checked. For example, if files 3, 3.1, and 3.2 are damaged, only damage of file 3 can be detected. When the number of segment-page files is less than 5, the files that are not generated are also checked during function detection. For example, if there are only files 1 and 2, files 3, 4, and 5 are checked during segment-page file detection. In the following examples, the first is an example of checking a non-segment-page table, and the second is an example of checking a segment-page table.
Parameter description:
- verify_segment
Specifies the range of files to be checked. false indicates that non-segment-page tables are checked. true indicates that segment-page tables are checked.
The value can be true or false (default value).
Return type: record.
Example (The abnormal line is displayed only when an exception is detected. Otherwise, no line is displayed.):
Check a non-segment-page table.
openGauss=# select * from gs_verify_data_file(); node_name | rel_oid | rel_name | miss_file_path ------------------+---------+--------------+------------------ dn_6001_6002_6003 | 16554 | test | base/16552/24745
Check a segment-page table.
openGauss=# select * from gs_verify_data_file(true); node_name | rel_oid | rel_name | miss_file_path -------------------+---------+----------+---------------- dn_6001_6002_6003 | 0 | none | base/16573/2
- gs_repair_file(tableoid Oid, path text, timeout int)
Description: Repairs the file based on the input parameters. Only the primary DN with normal primary/standby connection is supported. The parameter is set based on the OID and path returned by the gs_verify_data_file function. The value of table OID for a segment-page table ranges from 0 to 4294967295. (The internal verification determines whether a file is a segment-page table file based on the file path. The table OID is not used for a segment-page table file.) If the repair is successful, true is returned. If the repair fails, the failure cause is displayed. By default, only the initial user, users with the SYSADMIN permission, and users with the OPRADMIN permission in the O&M mode on the primary DN can view the information. Other users can view the information only after being granted with permissions.
- If a file on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal.
- If a file exists but its size is 0, the file will not be repaired. To repair the file, you need to delete the file whose size is 0 and then repair it.
- You can delete a file only after the file descriptor is automatically closed. You can manually restart the process or perform a primary/standby switchover.
Parameter description:
- tableoid
Specifies the OID of the table corresponding to the file to be repaired. Set this parameter based on the rel_oid column in the list returned by the gs_verify_data_file function.
Value range: OID ranging from 0 to 4294967295 Note: A negative value will be forcibly converted to a non-negative integer.
- path
Specifies the path of the file to be repaired. Set this parameter based on the miss_file_path column in the list returned by the gs_verify_data_file function.
Value range: a string
- timeout
Specifies the duration for waiting for the standby DN to replay. The repair file needs to wait for the standby DN to be put back to the corresponding location on the current primary DN. Set this parameter based on the replay duration of the standby DN.
Value range: 60s to 3600s.
Return type: Boolean.
Example (Set tableoid and path based on the output of gs_verify_data_file):
openGauss=# select * from gs_repair_file(16554,'base/16552/24745',360); gs_repair_file ---------------- t
- local_bad_block_info()
Description: Displays the page damage of the instance. You can read the page from the disk and record the page CRC failure. By default, only the initial user, users with the SYSADMIN attribute, users with the MONADMIN attribute, users with the OPRADMIN attribute in the O&M mode, and monitor users can view the information. Other users can view the information only after being granted with permissions. file_path indicates the relative path of the damaged file. If the table is a segment-page table, the logical information instead of the actual physical file information is displayed. block_num indicates the number of the page where the file is damaged. The page number starts from 0. check_time indicates the time when the page damage is detected. repair_time indicates the time when the page is repaired.
Return type: record.
Example (Related entries are displayed only when there are damaged records. Otherwise, no log is displayed.):
openGauss=# select * from local_bad_block_info(); node_name | spc_node | db_node | rel_node| bucket_node | fork_num | block_num | file_path | check_time | repair_time -----------------+-------+--------+--------+--------------+----------+-----------+-----------------+--------------------------+------------------------------- dn_6001_6002_6003| 1663 | 16552 | 24745 | -1 | 0 | 0 | base/16552/24745 | 2022-01-13 20:19:08.385004+08 | 2022-01-13 20:19:08.407314+08
- remote_bad_block_info()
Description: Queries the page damage of other instances except the current instance when a query is performed on the CN. The recorded data is the same as that of the local_bad_block_info function executed on other instances. The execution result on the DN is empty. By default, only the initial user, users with the SYSADMIN attribute, users with the MONADMIN attribute, users with the OPRADMIN attribute in the O&M mode, and monitor users can view the information. Other users can view the information only after being granted with permissions.
Return type: record.
- local_clear_bad_block_info()
Description: Deletes data of repaired pages from local_bad_block_info, that is, information whose repair_time is not empty. By default, only the initial user, users with the SYSADMIN permission, users with the OPRADMIN attribute in the O&M mode, and monitoring users can view the information. Other users can view the information only after being granted with permissions.
Return type: Boolean.
Example:
openGauss=# select * from local_clear_bad_block_info(); result -------- t
- remote_clear_bad_block_info()
Description: Clears the data of the repaired pages of other instances except the current instance when this function is executed on the CN, that is, information whose repair_time is not empty. The execution result on the DN is empty. By default, only the initial user, users with the SYSADMIN permission, users with the OPRADMIN attribute in the O&M mode, and monitoring users can view the information. Other users can view the information only after being granted with permissions.
Return type: record.
- gs_verify_and_tryrepair_page (path text, blocknum Oid, verify_mem bool, is_segment bool)
Description: Verifies the page specified by the instance. By default, only the initial user, users with the SYSADMIN permission, and users with the OPRADMIN permission in the O&M mode on the primary DN can view the information. Other users can view the information only after being granted with permissions. In the command output, disk_page_res indicates the verification result of the page on the disk, mem_page_res indicates the verification result of the page in the memory, and is_repair specifies whether the repair function is triggered during the verification. t indicates that the page is repaired, and f indicates that the page is not repaired.
Note:
1. If a page on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal. Damaged pages of hash bucket tables cannot be repaired.
2. The repair triggered by this function can only repair pages in the memory. The repair takes effect only after the memory pages are flushed to disks.
Parameters:
- path
Path of the damaged file. Set this parameter based on the file_path column in local_bad_block_info. To verify the undo pages of the Ustore table, enter the path of the undo pages to be verified.
Value range: a string
- blocknum
Page number of the damaged file. Set this parameter based on the block_num column in local_bad_block_info. If you want to verify the undo pages of the Ustore table, enter the block number of the undo pages to be verified.
Value range: OID ranging from 0 to 4294967295. Note: A negative value will be forcibly converted to a non-negative integer.
- verify_mem
Specifies whether to verify a specified page in the memory. If this parameter is set to false, only pages on the disk are verified. If this parameter is set to true, pages in the memory and those on the disk are verified. If a page on the disk is damaged, the system verifies the basic information of the page in the memory and flushes the page to the disk to restore the page. If a page is not found in the memory during memory page verification, the page on the disk is read through the memory API. During this process, if the disk page is faulty, the automatic repair function through remote read is triggered.
Value range: The value is of a Boolean type and can be true or false.
- is_segment
Specifies whether the table is a segment-page table. Set this parameter based on the value of bucket_node in local_bad_block_info. If the value of bucket_node is –1, the table is not a segment-page table. In this case, set is_segment to false. If the value of bucket_node is not –1, set is_segment to true.
Value range: The value is of a Boolean type and can be true or false.
Return type: record.
Examples (Set parameters based on the output of local_bad_block_info. Otherwise, an error is reported.):
openGauss=# select * from gs_verify_and_tryrepair_page('base/16552/24745',0,false,false); node_name | path | blocknum | disk_page_res | mem_page_res | is_repair ------------------+------------------+------------+-----------------------------+---------------+---------- dn_6001_6002_6003 | base/16552/24745 | 0 | page verification succeeded.| | f
- path
- gs_repair_page(path text, blocknum oid is_segment bool, timeout int)
Description: Restores the specified page of the instance. This function can be used only by the primary DN that is properly connected to the primary and standby DNs. If the page is successfully restored, true is returned. If an error occurs during the restoration, an error message is displayed. By default, only the initial user, users with the SYSADMIN permission, and users with the OPRADMIN permission in the O&M mode on the primary DN can view the information. Other users can view the information only after being granted with permissions.
Note: If a page on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal. Damaged pages of hash bucket tables cannot be repaired.
Parameters:
- path
Specifies the path of the damaged page. Set this parameter based on the file_path column in local_bad_block_info or the path column in gs_verify_and_tryrepair_page.
Value range: a string
- blocknum
Specifies the number of the damaged page. Set this parameter based on the block_num column in local_bad_block_info or the blocknum column in gs_verify_and_tryrepair_page.
Value range: OID ranging from 0 to 4294967295. Note: A negative value will be forcibly converted to a non-negative integer.
- is_segment
Specifies whether the table is a segment-page table. The value of this parameter is determined by the value of bucket_node in local_bad_block_info. If the value of bucket_node is –1, the table is not a segment-page table and is_segment is set to false. If the value of bucket_node is not –1, is_segment is set to true.
Value range: The value is of Boolean type and can be true or false.
- timeout
Duration of waiting for standby DN replay. The page to be repaired needs to wait for the standby DN to be replayed to the location of the current primary DN. Set this parameter based on the replay duration of the standby DN.
Value range: 60s to 3600s.
Return type: Boolean.
Examples (Set parameters based on the output of local_bad_block_info. Otherwise, an error is reported.):
openGauss=# select * from gs_repair_page('base/16552/24745',0,false,60); result -------- t
- path
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot