ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper
Symptom
An instance node in the ClickHouse cluster fails to start. The startup log of the instance node contains error information similar to the following:
2021.03.15 21:01:19.816593 [ 11111 ] {} <Error> Application: DB::Exception: The local set of parts of table DEFAULT.lineorder doesn't look like the set ofdoesn't look like the set of parts in ZooKeeper: 59.99 million rows of 59.99 million total rows in filesystem are suspicious. There are 30 unexpected parts with 59986052 rows (14 of them is not just-written with 59986052 rows), 0 missing parts (with 0 blocks).: Cannot attach table `DEFAULT`.`lineorder` from metadata file ... : while loading database
Cause Analysis
When a ClickHouse instance is abnormal, the ReplicatedMergeTree engine table is repeatedly created in the cluster, and then deleted. The creation and deletion of the ReplicatedMergeTree engine table causes data error in ZooKeeper, which causes a start failure of ClickHouse.
Solution
- Back up all data tables in the database of the faulty node to another directory.
- Back up table data:
cd /srv/BigData/data1/clickhouse/data/Database name
mv Table name Directory to be backed up/data1
If there are multiple disks, back up data of data1 to dataN.
- Back up metadata information:
cd /srv/BigData/data1/clickhouse_path/metadata
mv Table name.sql Directory to be backed up
For example, to back up the lineorder table in the default database to the /home/backup directory, run the following command.
cd /srv/BigData/data1/clickhouse/data/default
mv lineorder /home/backup/data1
cd /srv/BigData/data1/clickhouse_path/metadata
mv lineorder.sql /home/backup
- Back up table data:
- Log in to MRS Manager, choose Cluster > Services > ClickHouse > Instance, select the target instance node, and click Start Instance.
- After the instance is started, use the ClickHouse client to log in to the faulty node.
clickhouse client --host Clickhouse instance IP address --user User name --password Password
- Run the following command to obtain the ZooKeeper path zookeeper_path of the current table and replica_num of the corresponding node.
SELECT zookeeper_path FROM system.replicas WHERE database = 'Database name' AND table = 'Table name';
SELECT replica_num,host_name FROM system.clusters;
- Run the following command to access the ZooKeeper command line interface:
zkCli.sh -server IP address of the ZooKeeper node:2181
- Locate the ZooKeeper path corresponding to the table data of the faulty node.
ls zookeeper_path/replicas/replica_num
- Run the following command to delete the replica data from ZooKeeper:
deleteall zookeeper_path/replicas/replica_num
- Use the ClickHouse client to log in to the node and create the ReplicatedMergeTree engine table of the cluster.
clickhouse client --host Clickhouse instance IP address --multiline --user Username --password Password
CREATE TABLE Database name.Table name ON CLUSTER Cluster name
...
ENGINE = ReplicatedMergeTree ...
The following error message is displayed on other replica nodes, which is normal and can be ignored.
Received exception from server (version 20.8.7): Code: 57. DB::Exception: Received from x.x.x.x:9000. DB::Exception: There was an error on [x.x.x.x:9000]: Code: 57, e.displayText() = DB::Exception: Table DEFAULT.lineorder already exists. (version 20.8.11.17 (official build)).
After the table is successfully created, the table data on the faulty node will be automatically synchronized. The data restoration is complete.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot