Data Damage Detection and Repair Functions
File Type |
File/Page |
Primary/Standby |
Detection and Repair |
Common row-store tables (including Astore, Ustore and compressed tables) (excluding index) |
File and page |
Primary |
Manual detection and repair. |
Undo files (excluding undo meta) |
Page |
Primary |
Manual detection and repair (excluding analyse verify). |
init fork file for unlogged tables |
File |
Primary |
Manual detection and repair. |
- gs_verify_data_file(verify_segment bool)
Description: Checks whether files in the current database of the current instance are lost. The verification only checks whether intermediate segments are lost in the main file of the data table. The default value is false. If this parameter is set to true, it is a reserved parameter and not supported currently. By default, only initial users, users with the sysadmin permission, and users with the O&M administrator attribute in the O&M mode can view the information. Other users can view the information only after being granted with permissions.
The returned result is as follows:- rel_oid and rel_name indicate the table OID and table name of the corresponding file, and miss_file_path indicates the relative path of the lost file.
Parameter description:
- verify_segment
Specifies the range of files to be checked.
The value can be true or false (default value). true is a reserved parameter and not supported currently.
Return type: record
Example (The abnormal line is displayed only when an exception is detected. Otherwise, no line is displayed.):
gaussdb=# select * from gs_verify_data_file(); node_name | rel_oid | rel_name | miss_file_path ------------------+---------+--------------+------------------ dn_6001_6002_6003 | 16554 | test | base/16552/24745
- gs_repair_file(tableoid Oid, path text, timeout int)
Description: Repairs the file based on the input parameters. Only the primary DN with normal primary/standby connection is supported. The parameter is set based on the OID and path returned by the gs_verify_data_file function. If the repair is successful, true is returned. If the repair fails, the failure cause is displayed. By default, only initial users, users with the sysadmin permission, and users with the O&M administrator permission in the O&M mode on the primary DN can view the information. Other users can view the information only after being granted with permissions.
- If a file on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal.
- If a file exists but its size is 0, the file will not be repaired. To repair the file, you need to delete the file whose size is 0 and then repair it.
- You can delete a file only after the file descriptor is automatically closed. You can manually restart the process or perform a primary/standby switchover.
Parameter description:
- tableoid
OID of the table corresponding to the file to be repaired. Set this parameter based on the rel_oid column in the list returned by the gs_verify_data_file function.
Value range: OID ranging from 0 to 4294967295. Note: A negative value will be forcibly converted to a non-negative integer.
- path
Path of the file to be repaired. Set this parameter based on the miss_file_path column in the list returned by the gs_verify_data_file function.
Value range: a string
- timeout
Specifies the duration for waiting for the standby DN to replay. The repair file needs to wait for the standby DN to be put back to the corresponding location on the current primary DN. Set this parameter based on the replay duration of the standby DN.
Value range: 60s to 3600s.
Return type: Boolean
Example (Set tablespace and path based on the output of gs_verify_data_file):
gaussdb=# select * from gs_repair_file(16554,'base/16552/24745',360); gs_repair_file ---------------- t
- local_bad_block_info()
Description: Displays the page damage of the instance. You can read the page from the disk and record the page CRC failure. By default, only initial users, users with the sysadmin permission, users with the monitoring administrator attribute, users with the O&M administrator attribute in the O&M mode, and monitoring users can view the information. Other users can view the information only after being granted with permissions.
In the displayed information, file_path indicates the relative path of the damaged file. block_num indicates the number of the page where the file is damaged. The page number starts from 0. check_time indicates the time when the page damage is detected. repair_time indicates the time when the page is repaired.
Return type: record
Example (Related entries are displayed only when there are damaged records. Otherwise, no log is displayed.):
gaussdb=# select * from local_bad_block_info(); node_name | spc_node | db_node | rel_node| bucket_node | fork_num | block_num | file_path | check_time | repair_time -----------------+-------+--------+--------+--------------+----------+-----------+-----------------+--------------------------+------------------------------- dn_6001_6002_6003| 1663 | 16552 | 24745 | -1 | 0 | 0 | base/16552/24745 | 2022-01-13 20:19:08.385004+08 | 2022-01-13 20:19:08.407314+08
- local_clear_bad_block_info()
Description: Deletes data of repaired pages from local_bad_block_info, that is, information whose repair_time is not empty. By default, only initial users, users with the sysadmin permission, users with the O&M administrator attribute in the O&M mode, and monitoring users can view the information. Other users can view the information only after being granted with permissions.
Return type: Boolean
Example:
gaussdb=# select * from local_clear_bad_block_info(); result -------- t
- gs_verify_and_tryrepair_page (path text, blocknum oid, verify_mem bool, is_segment bool)
Description: Verifies the page specified by the instance. By default, only the initial user, users with the sysadmin permission, and users with the O&M administrator attribute in O&M mode on the primary DN can view the table. Other users can view the table only after being granted with permissions.
In the command output, disk_page_res indicates the verification result of the page on the disk, mem_page_res indicates the verification result of the page in the memory, and is_repair specifies whether the repair function is triggered during the verification. t indicates that the page is repaired, and f indicates that the page is not repaired.
Note:
- If a page on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal. Damaged pages of hash bucket tables cannot be repaired.
- The repair triggered by this function can only repair pages in the memory. The repair takes effect only after the memory pages are flushed to disks.
Parameter description:
- path
Path of the damaged file. Set this parameter based on the file_path column in the local_bad_block_info file. To verify the undo pages of a Ustore table, enter the path of the undo pages to be verified.
Value range: a string
- blocknum
Page number of the damaged file. Set this parameter based on the block_num column in the local_bad_block_info file. If you want to verify the undo pages of a Ustore table, enter the block number of the undo pages to be verified.
Value range: OID ranging from 0 to 4294967295. Note: A negative value will be forcibly converted to a non-negative integer.
- verify_mem
Specifies whether to verify a specified page in the memory. If this parameter is set to false, only pages on the disk are verified. If this parameter is set to true, pages in the memory and on the disk are verified. If a page on the disk is damaged, the system verifies the basic information of the page in the memory and flushes the page to the disk to restore the page. If a page is not found in the memory during memory page verification, the page on the disk is read through the memory API. During this process, if the disk page is faulty, the remote read automatic repair function is triggered.
Value range: The value is of a Boolean type and can be true or false.
- is_segment
Determines whether the table is a segment-page table. false indicates that the table is not a segment-page table. true indicates a reserved parameter value, which is not supported currently.
Value range: The value is of a Boolean type and can be true or false.
Return type: record
Examples (Transfer parameters based on the output of local_bad_block_info. Otherwise, an error is reported.):
gaussdb=# select * from gs_verify_and_tryrepair_page('base/16552/24745',0,false,false); node_name | path | blocknum | disk_page_res | mem_page_res | is_repair ------------------+------------------+------------+-----------------------------+---------------+---------- dn_6001_6002_6003 | base/16552/24745 | 0 | page verification succeeded.| | f
- gs_repair_page(path text, blocknum oid, is_segment bool, timeout int)
Description: Restores the specified page of the instance. This function can be used only by the primary DN that is properly connected to the primary and standby DNs. If the page is successfully restored, true is returned. If an error occurs during the restoration, an error message is displayed. By default, only the initial user, users with the sysadmin permission, and users with the O&M administrator attribute in O&M mode on the primary DN can view the table. Other users can view the table only after being granted with permissions.
Note: If a page on a DN is damaged, a verification error at the PANIC level is reported when the DN is promoted to primary. The DN cannot be promoted to primary, which is normal. Damaged pages of hash bucket tables cannot be repaired.
Parameter description:
- path
Path of the damaged page. Set this parameter based on the file_path column in local_bad_block_info or the path column in the gs_verify_and_tryrepair_page function.
Value range: a string
- blocknum
Number of the damaged page. Set this parameter based on the block_num column in local_bad_block_info or the blocknum column in the gs_verify_and_tryrepair_page function.
Value range: OID ranging from 0 to 4294967295. Note: A negative value will be forcibly converted to a non-negative integer.
- is_segment
Determines whether the table is a segment-page table. false indicates that the table is not a segment-page table. true indicates a reserved parameter value, which is not supported currently.
Value range: The value is of a Boolean type and can be true or false.
- timeout
Duration of waiting for standby DN replay. The page to be repaired needs to wait for the standby DN to be played back to the location of the current primary DN. Set this parameter based on the replay duration of the standby DN.
Value range: 60s to 3600s.
Return type: Boolean
Examples (Transfer parameters based on the output of local_bad_block_info. Otherwise, an error is reported.):
gaussdb=# select * from gs_repair_page('base/16552/24745',0,false,60); result -------- t
- path
- gs_edit_page_bypath(path text, blocknum int64, offset int, data text, data_size int, read_backup bool, storage_type text)
Description: Transfers the path, block number, offset, target data to be modified, and length of the target table file, and modifies the target data to the corresponding fields on the page. The read_backup column determines the file reading mode, and the storage_type column indicates the file storage mode (for example, page storage). To prevent incorrect modification, this function does not directly modify the original page but modifies the copied page and flushes the modified page to the specified path. Only the system administrator or O&M administrator in O&M mode can execute this function.
Return type: text
Table 1 gs_edit_page_bypath parameters Category
Parameter
Type
Description
Input parameter
path
text
Physical file path of the file to be modified, which is related to the read_backup column. The value can be the relative path of the file in the database directory or the absolute path of files such as the backup file. If the target file does not exist or fails to be read, an error message is displayed.
- If read_backup is false, the path format is tablespace name/database oid/table relfilenode (physical file name). For example, base/16603/16394.
- If read_backup is true, path is a valid path. In this case, because other information about the input file cannot be obtained, you need to ensure that the input data is correct.
Note: Only U-page and UB-tree data pages can be edited and modified. Tables with tablespaces are not supported. Other information about the input file cannot be obtained. Therefore, you need to ensure that the input data type is correct.
Input parameter
blocknum
bigint
Block number of the page to be repaired.
Value range: 0 to MaxBlockNumber.
Reads the page corresponding to the specified physical or logical block number based on the read_backup column. If the specified block number is out of range, an error message is returned.
Input parameter
offset
int
In-page offset of the column to be modified.
Value range: 0 to BLCKSZ.
If the specified value is less than 0 or greater than that of BLCKSZ, the system view is used to return the corresponding error information.
Input parameter
data
text
Type of the target value to be modified.
Type:
- '0x': hexadecimal.
- '0b': binary.
- '0s': character string.
Others: If the value of the data parameter is not one of the preceding types, the data is a decimal character string.
Input parameter
data_size
int
Length of the written data, in bytes.
Value range: 1 to 8.
If the specified write length is less than 1 byte or greater than 8 bytes, or the sum of offset and data_size is greater than the value of BLCKSZ, the system view is used to return the corresponding error information.
Input parameter
read_backup
bool
Specifies whether to read pages from the backup directory. If this parameter is set to false, the target page is read based on the logical block number. Otherwise, the page is read based on the physical block number.
Input parameter
storage_type
text
File storage mode. Currently, only the page storage mode is supported. This parameter is optional.
- 'page': page mode.
- 'segment': segment-page mode. This parameter is reserved and is not supported currently.
Output parameter
output_msg
text
If the modification is successful, the absolute path of the modified file is returned. The modified file is stored in the pg_log/dump directory. If the modification fails, a failure message is returned.
Note: In the example, transfer parameters based on the parameter description and use the actual physical path.
Example 1: Overwrite the data whose value is 0X1FFF at the offset of 16 bytes on page 0 in the base/15808/25075 table.
gaussdb=# select gs_edit_page_bypath('base/15808/25075',0,16,'0x1FFF', 2, false, 'page'); gs_edit_page_bypath ---------------------------------------------------------------------- /pg_log_dir/dump/1663_15808_25075_0.editpage (1 rows)
Example 2: If the input parameter does not comply with the specifications, an error message is returned.
gaussdb=# select gs_edit_page_bypath('base/15808/25075', 0,16,'@1231!', 8, false, 'page'); gs_edit_page_bypath ------------------------------------------- Error: the parameter 'data' decode failed. (1 row)
Example 3: When the data to be written is the same as the original value, an alarm is returned.
gaussdb=# select gs_edit_page_bypath('/pg_log_dir/dump/1663_15808_25075_0.editpage', 0,16,'0x1FFF', 2, true, 'page'); gs_edit_page_bypath ---------------------------------------------------------- Warning: source buffer is consistent with target buffer. (1 row)
- gs_repair_page_bypath(src_path text, src_blkno int64, dest_path text, dest_blkno int64, storage_type text)
Description: Transfers the path and page number of the source file, and writes the page to the specified page number of the target file. You can repair the pages of the primary node through the standby node. In addition, you can initialize bad blocks in this view.
- The target page is overwritten and synchronized to the standby node. The page-based modification object supports the U-heap and UB-tree pages. The Undo Record page, Undo Slot page, compressed table, and Astore page will be supported later. System catalog files and data sections cannot be modified.
- With this function, you can overwrite target pages during the write operation. Before overwriting, the target page is backed up and flushed to a specified directory. The backup page can be rewritten back to the target page. If an ordinary table is modified on the primary node, a new WAL is generated and synchronized to the standby node. If an ordinary table is modified on the standby node, no WAL is recorded.
- The repair view applies only to the primary node in a centralized or distributed system or the standby node when the read function is enabled on the standby node. Only the system administrator or O&M administrator in O&M mode can use this function. All modifications will be recorded in database logs. In addition, you are advised to enable the audit logging function of system functions before using this function to record audit information.
- The LSNs of the source and target pages must be the same. Otherwise, the repair fails.
Return type: text
Calling this system function is a high-risk operation. Exercise caution when performing this operation.
Category
Parameter
Type
Description
Input parameter
src_path
text
Path of the source file. The following types of paths are supported:
- Data files and index files: pg_log/dump/1663_15808_25075_0.editpage.
- src_path is set to 'standby' on the primary node. That is, pages are read from the standby node to repair the primary node.
- src_path is set to init_block on the primary node to allow skipping bad blocks in extreme scenarios.
Input parameter
src_blkno
bigint
Physical block number of the source page.
Value range: 0 to MaxBlockNumber.
Input parameter
dest_path
text
Relative path of the target file. For example, base/15808/25075.
Input parameter
dest_blkno
bigint
Logical block number of the target page.
Value range: 0 to MaxBlockNumber.
Input parameter
storage_type
text
Storage mode of the target file. Currently, only the page storage mode is supported. This parameter is optional.
- 'page': page mode.
- 'segment': segment-page. This parameter is reserved and is not supported currently.
Output parameter
output_msg
text
If the write overwrite operation is successful, the backup path of the target page is returned. If the write overwrite operation fails, an error message is returned. The format of the flushed file name is relfilepath_blocknum_timestamp.repairpage.
Note: Transfer parameters based on the preceding table and ensure that the physical file exists. If the input parameter is abnormal or the restoration fails, an error is reported.
Example 1: Enter a file in a specified path to overwrite the target file.
gaussdb=# select * from gs_repair_page_bypath('pg_log/dump/1663_15991_16767_0.editpage', 0, 'base/15991/16767', 0, 'page'); output_msg ------------------------------------------------------------------------------------------------ /pg_log_dir/dump/1663_15991_16767_0_738039702421788.repairpage (1 row)
Example 2: Read pages from the standby node to repair the primary node.
gaussdb=# select * from gs_repair_page_bypath('standby', 0, 'base/15990/16768', 0, 'page'); output_msg ------------------------------------------------------------------------------------------------- /pg_log_dir/dump/1663_15990_16768_0_738040397197907.repairpage (1 row)
Example 3: Initialize the target page and skip bad blocks.
gaussdb=# select * from gs_repair_page_bypath('init_block', 0, 'base/15990/16768', 0, 'page'); output_msg ------------------------------------------------------------------------------------------------- /pg_log_dir/dump/1663_15990_16768_0_738040768010281.repairpage (1 row)
- gs_repair_undo_byzone(zone_id int)
Description: Transfers the ID of the undo zone to be repaired, repairs the metadata of the target undo zone, and returns the repair result details. If the undo zone is not repaired, no information is output.
Return type: record
Note: Currently, the function can be called only on the primary node. After the repair is successful, the repair will be synchronized to the standby node by recording Xlogs. The caller must be a system administrator or an O&M administrator in O&M mode. You are advised to enable the audit logging function before using the function to record audit information.
Calling this system function is a high-risk operation. Exercise caution when performing this operation.
Table 2 gs_repair_undo_byzone parameters Category
Parameter
Type
Description
Input parameter
zone_id
int
Undo zone ID:- –1: repairs the metadata of all undo zones.
- 0 to 1048575: repairs the metadata of the undo zone corresponding to the zone ID.
Output parameter
zone_id
int
Undo zone ID.
Output parameter
repair_detail
text
Repair result of the undo zone metadata corresponding to the zone ID. If the repair is successful, "rebuild undo meta succeed." is displayed. If the repair fails, "rebuild undo meta failed." as well as the failure cause is displayed.
Note: The output is one of the three cases based on the repair result.
Example 1: If the undo zone meta information corresponding to the entered zone_id is not damaged, no output is expected.
gaussdb=# select * from gs_repair_undo_byzone(4); zone_id | repair_detail ---------+--------------- (0 rows)
Example 2: If the undo zone metadata corresponding to the entered zone_id is successfully restored, the system displays a message indicating that the restoration is successful.
gaussdb=# select * from gs_repair_undo_byzone(78); zone_id | repair_detail ---------+--------------- 78 | rebuild undo meta succeed. (1 row)
Example 3: If the undo zone metadata corresponding to the entered zone ID fails to be repaired, the detailed information about the repair failure is displayed.
gaussdb=# select * from gs_repair_undo_byzone(0); zone_id | repair_detail ---------+--------------- 0 | rebuild undo meta failed. try lock undo zone_id failed. (1 row)
If the undo zone to be repaired is damaged and the zone ID is occupied by another active thread, the active thread that occupies the zone ID automatically ends when the repair function is called to forcibly repair the damaged undo zone metadata.
- gs_verify_urq(index_oid oid, partindex_oid oid, blocknum bigint, queue_type text)
Description: Verifies the correctness of the index recycling queue (potential queue/available queue/single page).
Parameter description: See Table 3.
Return type: record
Table 3 gs_verify_urq parameters Category
Parameter
Type
Description
Input parameter
index_oid
oid
UB-tree index OID.
- Common index: index OID.
- Global index: GPI OID.
- Local index: OID of the primary index.
Input parameter
partindex_oid
oid
UB-Tree partitioned index OID:
- Common index: 0.
- Global index: 0.
- Local index: OID of the partitioned index (primary or secondary).
Input parameter
blocknum
bigint
Specifies the page number:
- If the queue type is single page, the correctness of all tuples of blocknum on a single page is verified. The value range is [0,Queue file size/8192).
- If the queue is empty or free, blocknum is an invalid value.
Input parameter
queue_type
text
Specifies the queue type:
- empty queue: potential queue
- free queue: available queue
- single page: single-page queue
Output parameter
error_code
text
Error code
Output parameter
detail
text
Detailed error information and other key information.
Example 1: When using the example, transfer parameters based on the parameter description and use the actual OID and blocknum. Otherwise, an error is reported.
gaussdb=# select * from gs_verify_urq(16387, 0, 1, 'free queue'); error_code | detail ------------+-------- (0 rows)
Example 2: When using the example, transfer parameters based on the parameter description and use the actual OID and blocknum. Otherwise, an error is reported.
gaussdb=# select * from gs_verify_urq(16387, 0, 1, 'empty queue'); error_code | detail -----------------------+--------------------------------------------------------------------------------------------------------------- VERIFY_URQ_PAGE_ERROR | invalid urq meta: oid 16387, blkno 1, head_blkno = 1, tail_blkno = 3, nblocks_upper = 4294967295, nblocks_lower = 1; urq_blocks = 6, index_blocks = 12 (1 row)
Currently, this API supports only Ustore index tables. If the verification of the index recycling queue is normal, the view does not display the error code and error details. Otherwise, the view displays the error code and error details. The error codes include "VERIFY_URQ_PAGE_ERROR", "VERIFY_URQ_LINK_ERROR", "VERIFY_URQ_HEAD_MISSED_ERROR", and "VERIFY_URQ_TAIL_MISSED_ERROR". If any of the preceding error codes is displayed, contact Huawei engineers to locate the fault.
- gs_urq_dump_stat(index_oid oid, partindex_oid oid)
Description: Queries information about a specified index recycling queue.
In the return result, recentGlobalDataXmin and globalFrozenXid are two oldestxmins used by the recycling queue to determine whether the index page can be recycled, next_xid is the XID of the next latest transaction, urq_blocks indicates the total number of pages in the recycling queue and information about valid pages in the free queue (available queue) and empty queue (potential queue).
Parameter description: See Table 4.
Table 4 gs_urq_dump_stat parameters Category
Parameter
Type
Description
Input parameter
index_oid
oid
UB-tree index OID.
- Common index: index OID.
- Global index: GPI OID.
- Local index: OID of the primary index.
Input parameter
partindex_oid
oid
UB-Tree partitioned index OID:
- Common index: 0.
- Global index: 0.
- Local index: OID of the partitioned index (primary or secondary).
Output parameter
result
text
Detailed statistics about the index recycling queue.
Example: When using the example, transfer parameters based on the parameter description and use the actual OID. Otherwise, an error is reported.
gaussdb=# select * from gs_urq_dump_stat(16387, 0); result --------------------------------------------------------------------------------------------------------------------------------- urq stat info: recentGlobalDataXmin = 213156, globalFrozenXid = 213156, next_xid = 214157, urq_blocks = 6, + free queue: head page blkno = 0 min_xid = 211187 max_xid = 214157, tail page blkno = 0 min_xid = 211187 max_xid = 214157,+ middle page min_xid = 1152921504606846975 max_xid = 0, valid_pages = 1, valid_items = 6, can_use_item = 3 + empty queue: head page blkno = 1 min_xid = 212160 max_xid = 213160, tail page blkno = 3 min_xid = 213162 max_xid = 214156,+ middle page min_xid = 1152921504606846975 max_xid = 0, valid_pages = 2, valid_items = 999, can_use_item = 498 + (1 row)
Currently, this API supports only Ustore index tables.
- gs_repair_urq(index_oid oid, partindex_oid oid)
Description: Repairs (with loss) index recycling queues (potential and available queues). The recycling queue file of the current index is deleted and an empty recycling queue file is created. If the repair is successful, reinitial the recycle queue of index relation successfully is displayed.
Parameter description: See Table 5.
Note: The current function can be called only on the primary node.
Table 5 gs_repair_urq parameters Category
Parameter
Type
Description
Input parameter
index_oid
oid
UB-tree index OID.
- Common index: index OID.
- Global index: GPI OID.
- Local index: OID of the primary index.
Input parameter
partindex_oid
oid
UB-Tree partitioned index OID:
- Common index: 0.
- Global index: 0.
- Local index: OID of the partitioned index (primary or secondary).
Output parameter
result
text
If the repair is successful, reinitial the recycle queue of index relation successfully is displayed.
Example: When using the example, transfer parameters based on the parameter description and use the actual OID. Otherwise, an error is reported.
gaussdb=# select * from gs_repair_urq(16387, 0); result ------------------------------------------------------------ reinitial the recycle queue of index relation sucessfully. (1 row)
Currently, this API supports only Ustore index tables.
- gs_get_standby_bad_block_info()
Description: Displays the pages that have been detected on the standby node but have not been repaired. By default, only initial users, users with the sysadmin permission, users with the O&M administrator permission in the O&M mode, and users with the monitor administrator permission on the standby DN can view the information. Other users can view the information only after being granted with permissions. There are four return values in the invalid_type column: NOT_PRESENT (the page does not exist), NOT_INITIALIZED (the page initialization fails), LSN_CHECK_ERROR (the LSN check fails), and CRC_CHECK_ERROR (the CRC check fails).
Return type: record
Example: If no page is detected but not repaired, no line is displayed.
gaussdb=# select * from gs_get_standby_bad_block_info(); spc_node | db_node | rel_node | bucket_node | fork_num | block_num | invalid_type | master_page_lsn ----------+---------+----------+-------------+----------+-----------+-----------------+----------------- 1663 | 16552 | 24745 | -1 | 0 | 0 | CRC_CHECK_ERROR | 0/B2009E8 (1 rows)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot