Help Center/ MapReduce Service/ Troubleshooting/ Using ClickHouse/ Failed to Start the ClickHouse Due to Data Disorder in the ZooKeeper

Updated on 2025-08-19 GMT+08:00

View PDF

Failed to Start the ClickHouse Due to Data Disorder in the ZooKeeper

Symptom

An instance node in the ClickHouse cluster fails to start. The startup log of the instance node contains error information similar to the following:

2021.03.15 21:01:19.816593 [ 11111 ] {} <Error> Application: DB::Exception:
The local set of parts of table DEFAULT.lineorder doesn't look like the set ofdoesn't look like the set of
parts in ZooKeeper: 59.99 million rows of 59.99 million total rows in
filesystem are suspicious. There are 30 unexpected parts with 59986052 rows
(14 of them is not just-written with 59986052 rows), 0 missing parts (with 0
blocks).: Cannot attach table `DEFAULT`.`lineorder` from metadata file
...
: while loading database

Cause Analysis

When a ClickHouse instance is abnormal, the ReplicatedMergeTree engine table is repeatedly created in the cluster, and then deleted. The creation and deletion of the ReplicatedMergeTree engine table causes data errors in ZooKeeper, which results in a start failure of ClickHouse.

Solution

Back up all table data in the database of the faulty node to another directory.
- Back up table data:
  MRS 3.0.5 or earlier:
  
  cd /srv/BigData/data1/clickhouse/data/Database name
  
  mkdir -p Backup directory/data1
  
  mv {Table name} Backup directory/data1/
  
  MRS 3.1.0 or later:
  
  head -1 /srv/BigData/data1/clickhouse_path/metadata/Database name/Table name.sql | awk -F ' ' '{print $5}' | sed "s/'//g" Target UUID
  
  cd /srv/BigData/data1/clickhouse/store/{First three characters of the UUID}
  
  mkdir -p Backup directory/data1
  
  mv {UUID} Backup directory/data1/
  
  If there are multiple disks, back up data of data1 to dataN.
- Back up metadata information:
  cd /srv/BigData/data1/clickhouse_path/metadata/Database name
  
  mv Table name.sql Backup directory
For example, to back up table lineorder in the default database to the /home/backup directory, run the following commands:

cd /srv/BigData/data1/clickhouse/data/default

mkdir -p /home/backup/data1

mv lineorder /home/backup/data1

cd /srv/BigData/data1/clickhouse_path/metadata

mv lineorder.sql /home/backup
Log in to MRS Manager, choose Cluster > Services > ClickHouse > Instance, select the target instance node, and click Start Instance.
After the instance is started, use the ClickHouse client to log in to the faulty node. There can be security risks if a command contains the authentication password. You are advised to disable the command recording function (history) before running the command.

clickhouse client --host ClickHouse instance IP address --user Username --password Password
Run the following commands to obtain the ZooKeeper path zookeeper_path of the current table and replica_num of the corresponding node:

SELECT zookeeper_path FROM system.replicas WHERE database = 'Database name' AND table = 'Table name';

SELECT replica_num,host_name FROM system.clusters;
Run the following command to access the ZooKeeper CLI:

zkCli.sh -server IP address of the ZooKeeper node:2181

For details about how to obtain the ZooKeeper IP address, see How Do I Obtain the ZooKeeper Address?
Locate the ZooKeeper path corresponding to the table data of the faulty node.

ls zookeeper_path/replicas/replica_num

zookeeper_path indicates the zookeeper_path value obtained in 4.

replica_num indicates the replica_num value obtained in 4.
Run the following command to delete the replica data from ZooKeeper:

deleteall zookeeper_path/replicas/replica_num
Use the ClickHouse client to log in to the faulty node and create the ReplicatedMergeTree engine table of the cluster.

clickhouse client --host ClickHouse instance IP address --multiline --user Username --password Password

CREATE TABLE Database name.Table name ON CLUSTER Cluster name

...

ENGINE = ReplicatedMergeTree ...

The following error message is displayed on other replica nodes, which is normal and can be ignored:
```
Received exception from server (version 20.8.7):
Code: 57. DB::Exception: Received from x.x.x.x:9000. DB::Exception:
There was an error on [x.x.x.x:9000]: Code: 57, e.displayText() =
DB::Exception: Table DEFAULT.lineorder already exists. (version 20.8.11.17
(official build)).
```
After the table is successfully created, the table data on the faulty node will be automatically synchronized. The data restoration is complete.