Help Center/ MapReduce Service/ Component Operation Guide (LTS) (Ankara Region)/ Using CarbonData/ CarbonData FAQ/ How Do I Restore the Latest tablestatus File That Has Been Lost or Damaged When TableStatus Versioning Is Enabled?
Updated on 2024-11-29 GMT+08:00

How Do I Restore the Latest tablestatus File That Has Been Lost or Damaged When TableStatus Versioning Is Enabled?

Question

When the TableStatus versioning feature is enabled, how do I restore the latest tablestatus file if it is lost or damaged due to other exceptions?

Answer

Use the latest available tablestatus file to restore data in the following scenarios:

Scenario 1: The CarbonData data files and .segment files of the current batch are damaged and cannot be restored.

  1. Log in to the client node and run the following commands to view the tablestatus file of the HDFS table and find the latest tablestatus version number:

    cd Client installation path

    source bigdata_env

    source Spark/component_env

    kinit Component service user (You do not need to run the kinit command for normal clusters.)

    hdfs dfs -ls /user/hive/warehouse/hrdb.db/car01/Metadata

    In the preceding figure, the tablestatus_1669028899548 file of the current batch is damaged and the tablestatus_1669028852132 file is required.

  2. Go to Spark SQL and run the following command to change the value of latestversion to the latest version:

    alter table car01 set SERDEPROPERTIES ('latestversion'='1669082252132');

    You need to exit the current session, reconnect to the session, and perform the query. This method has been used to restore customer data as much as possible. Generally, segment data files on the live network cannot be restored in power-off scenarios.

Scenario 2: The CarbonData data files and .segment files of the current batch are complete and can be restored.

Use the TableStatusRecovery tool to restore non-partitioned tables. Log in to the Spark client node and run the following commands:

cd Client installation path

source bigdata_env

source Spark/component_env

kinit Component service user (You do not need to run the kinit command for normal clusters.)

spark-submit --master yarn --class org.apache.carbondata.recovery.tablestatus.TableStatusRecovery Spark/spark/carbonlib/carbondata-spark_*.jar hrdb car01

Parameter description: hrdb car01 indicates the table name.

Restrictions on using TableStatusRecovery for restoration:

  • After the merge, if the tablestatus file is lost or damaged, this tool cannot be used to restore the segments in the merge state because only the tablestatus file contains the segment merge information.
  • After segments are deleted by ID or date, if the tablestatus file is lost or damaged, the deleted segment information cannot be restored because only the tablestatus file contains the segment deletion information.
  • This tool cannot be used on materialized view tables.
  • If the latest tablestatus file is faulty and query cannot be performed after using this tool for restoration, remove this latest file and use the previous tablestatus file for restoration.