Help Center/ MapReduce Service/ User Guide (Kuala Lumpur Region)/ Troubleshooting/ Using ClickHouse/ ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper
Updated on 2022-12-14 GMT+08:00

ClickHouse Fails to Start Due to Incorrect Data in ZooKeeper

Symptom

An instance node in the ClickHouse cluster fails to start. The startup log of the instance node contains error information similar to the following:

2021.03.15 21:01:19.816593 [ 11111 ] {} <Error> Application: DB::Exception:
The local set of parts of table DEFAULT.lineorder doesn't look like the set ofdoesn't look like the set of
parts in ZooKeeper: 59.99 million rows of 59.99 million total rows in
filesystem are suspicious. There are 30 unexpected parts with 59986052 rows
(14 of them is not just-written with 59986052 rows), 0 missing parts (with 0
blocks).: Cannot attach table `DEFAULT`.`lineorder` from metadata file
...
: while loading database

Cause Analysis

When a ClickHouse instance is abnormal, the ReplicatedMergeTree engine table is repeatedly created in the cluster, and then deleted. The creation and deletion of the ReplicatedMergeTree engine table causes data error in ZooKeeper, which causes a start failure of ClickHouse.

Solution

  1. Back up all data tables in the database of the faulty node to another directory.

    • Back up table data:

      cd /srv/BigData/data1/clickhouse/data/Database name

      mv Table name Directory to be backed up/data1

      If there are multiple disks, back up data of data1 to dataN.

    • Back up metadata information:

      cd /srv/BigData/data1/clickhouse_path/metadata

      mv Table name.sql Directory to be backed up

    For example, to back up the lineorder table in the default database to the /home/backup directory, run the following command.

    cd /srv/BigData/data1/clickhouse/data/default

    mv lineorder /home/backup/data1

    cd /srv/BigData/data1/clickhouse_path/metadata

    mv lineorder.sql /home/backup

  2. Log in to MRS Manager, choose Cluster > Services > ClickHouse > Instance, select the target instance node, and click Start Instance.
  3. After the instance is started, use the ClickHouse client to log in to the faulty node.

    clickhouse client --host Clickhouse instance IP address --user User name --password Password

  4. Run the following command to obtain the ZooKeeper path zookeeper_path of the current table and replica_num of the corresponding node.

    SELECT zookeeper_path FROM system.replicas WHERE database = 'Database name' AND table = 'Table name';

    SELECT replica_num,host_name FROM system.clusters;

  5. Run the following command to access the ZooKeeper command line interface:

    zkCli.sh -server IP address of the ZooKeeper node:2181

  6. Locate the ZooKeeper path corresponding to the table data of the faulty node.

    ls zookeeper_path/replicas/replica_num

    zookeeper_path indicates the value of zookeeper_path obtained in 4.

    replica_num indicates the value of replica_num corresponding to the host in 4.

  7. Run the following command to delete the replica data from ZooKeeper:

    deleteall zookeeper_path/replicas/replica_num

  8. Use the ClickHouse client to log in to the node and create the ReplicatedMergeTree engine table of the cluster.

    clickhouse client --host Clickhouse instance IP address --multiline --user Username --password Password

    CREATE TABLE Database name.Table name ON CLUSTER Cluster name

    ...

    ENGINE = ReplicatedMergeTree ...

    The following error message is displayed on other replica nodes, which is normal and can be ignored.

    Received exception from server (version 20.8.7):
    Code: 57. DB::Exception: Received from x.x.x.x:9000. DB::Exception:
    There was an error on [x.x.x.x:9000]: Code: 57, e.displayText() =
    DB::Exception: Table DEFAULT.lineorder already exists. (version 20.8.11.17
    (official build)).

    After the table is successfully created, the table data on the faulty node will be automatically synchronized. The data restoration is complete.