文档首页/ MapReduce服务 MRS/ 组件操作指南(安卡拉区域)/ 使用ClickHouse/ ClickHouse常见问题/ ClickHouseServer实例节点下电上电,启动进程加载part时文件系统报错导致进程coredump
更新时间:2024-11-29 GMT+08:00

ClickHouseServer实例节点下电上电,启动进程加载part时文件系统报错导致进程coredump

问题描述

ClickHouseServer实例重启失败,报错信息如下:

进程重启过程中core堆栈,关键报错信息如下:
2023.09.11 15:34:49.085595 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:338] : (version 23.3.2.1, build id: 86C97F3EED917A2F2D9A691B4FB845F860FE7FF2) (from thread 29814) (no query) Received signal Aborted (6)
2023.09.11 15:34:49.085636 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:354] : 
2023.09.11 15:34:49.085662 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:367] : Stack trace: 0x7f2ed263a207 0x7f2ed263b8f8 0xb97032b 0x7f2ed30c7b83 0x7f2ed30c7b18 0x16de788c 0x151ccd63 0x151cf0ea 0x151cf77d 0xb7b8958 0xb7bc720 0x7f2ed29d8dd5 0x7f2ed2701ead
2023.09.11 15:34:49.085739 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:371] : 3. gsignal @ 0x36207 in /usr/lib64/libc-2.17.so
2023.09.11 15:34:49.085775 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:371] : 4. __GI_abort @ 0x378f8 in /usr/lib64/libc-2.17.so
2023.09.11 15:34:49.085820 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:371] : 5. terminate_handler() @ 0xb97032b in /opt/AA/BB/Bigdata/FusionInsight_ClickHouse_8.3.0/install/FusionInsight-ClickHouse-v23.3.2.37-lts/clickhouse/bin/clickhouse
2023.09.11 15:34:49.085854 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:371] : 6. std::__terminate(void (*)()) @ 0x99b83 in /opt/AA/BB/Bigdata_func/comp/ck/lib_lemmagen.so
2023.09.11 15:34:49.085875 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:371] : 7. std::terminate() @ 0x99b18 in /opt/AA/BB/Bigdata_func/comp/ck/lib_lemmagen.so
2023.09.11 15:34:49.085898 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:371] : 8. DB::MergeTreeData::loadOutdatedDataParts(bool) @ 0x16de788c in /opt/AA/BB/Bigdata/FusionInsight_ClickHouse_8.3.0/install/FusionInsight-ClickHouse-v23.3.2.37-lts/clickhouse/bin/clickhouse
2023.09.11 15:34:49.085920 [ 30174 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:371] : 9. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x151ccd63 in /opt/AA/BB/Bigdata/FusionInsight_ClickHouse_8.3.0/install/FusionInsight-ClickHouse-v23.3.2.37-lts/clickhouse/bin/clickhouse

导致上述出现上述core的原因是下面的文件系统报错:
2023.09.11 15:34:49.084809 [ 28762 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:280] : (version 23.3.2.1, build id: 86C97F3EED917A2F2D9A691B4FB845F860FE7FF2) (from thread 29814) Terminate called for uncaught exception:
2023.09.11 15:34:49.084883 [ 28762 ] {} <Fatal> BaseDaemon [BaseDaemon.cpp:291] : std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in directory_iterator::directory_iterator(...): Structure needs cleaning ["/srv/AA/BB/clickhouse/data1/clickhouse/store/b0b/b0b1f040-4bdb-4584-9be6-782e81fafeae/202309_46191_47131_657"]

处理步骤

  1. 登录重启失败的ClickHouseServer实例节点,在“/var/log/Bigdata/clickhouse/clickhouseServer/clickhouse-server.log”中搜索“Structure needs cleaning”关键词,找到损坏的part目录。
  2. 进入损坏的part目录,如上述日志中加粗部分所示(/srv/AA/BB/clickhouse/data1/clickhouse/store/b0b/b0b1f040-4bdb-4584-9be6-782e81fafeae/202309_46191_47131_657),清理该目录(202309_46191_47131_657),如果该目录无法清除,则可以清理该目录的上一级目录(b0b1f040-4bdb-4584-9be6-782e81fafeae)。
  3. 登录Manager页面,重新启动该ClickHouseServer实例。