Solution to the "Too many parts" Error in Data Tables

Troubleshooting

A faulty disk or storage media causes the merge to slow down or stop.
Log in to FusionInsight Manager and verify if an alarm for low disk space or other disk issues has been reported. Handle the alarm as described in the alarm handling guide.

If there is no sufficient disk space, contact customers to delete expired data to release space and quickly restore services.
The abnormal ZooKeeper service interrupts the merge operation.
Log in to FusionInsight Manager and check whether an alarm for unavailable ZooKeeper service and ClickHouse quota shortage in ZooKeeper has been reported. Handle the alarm as described in the alarm handling guide.
Run the following SQL statement to check whether tasks in the backup synchronization queue are stacked:
select FQDN() as node,type,count() from clusterAllReplicas(default_cluster, system.replication_queue) group by node,type;

Check whether an error is reported for the task in the backup queue and rectify the fault based on the error information.
Run the following SQL statement to check whether the table structures between nodes are consistent:
select FQDN(), create_table_query from clusterAllReplicas(default_cluster,system.tables) where name = '${table_name}' group by FQDN(),create_table_query;

If they are inconsistent, modify the table structures to be consistent with others.
Run the following SQL statement to check whether the mutation task is abnormal:
select FQDN(), database, table, mutation_id, create_time, command from clusterAllReplicas(default_cluster, system.mutations) where is_done = '0' order by create_time asc;

If the mutation task is normal, wait until the mutation task is complete. If the mutation task is abnormal, delete the task.
Too many write requests cause the merge operation slower than the insert operation.
You can run the following SQL statement to check the number of records written to the node where the error is reported in the last hour and the write frequency:

select tables,written_rows,count() from system.query_log where type='QueryFinish' and query_start_time between (toUnixTimestamp(now()) - 3600) AND toUnixTimestamp(now()) and query_kind = 'Insert' group by tables,written_rows order by written_rows limit 10;

Write data to one partition at a time. Do not write data too frequently. Do not insert a small amount of data in batches. Increase the interval for inserting data.

If the MERGE operations are not triggered or slow, adjust the following parameters to accelerate them.

For details, see Accelerating Merge Operations.

Configuration Item	Reference Value
max_threads	Number of CPU cores x 2
background_pool_size	Number of CPU cores
merge_max_block_size	The value is an integer multiple of 8192 and is adjusted based on the CPU and memory resources.
cleanup_delay_period	Set this parameter to a value that is appropriately less than the default value 30.

Changing the Value of parts_to_throw_insert

Increase the value of this parameter only in special scenarios. This configuration acts as a warning for potential issues to some extent. If the cluster hardware resources are insufficient and this configuration is not adjusted properly, potential service issues cannot be detected in a timely manner, which may cause other faults and increase the difficulty of fault recovery.

For versions earlier MRS 3.2.0, log in to FusionInsight Manager and choose Cluster > Services > ClickHouse. Click Configurations then All Configurations. Click ClickHouseServer > Customization, locate the clickhouse-config-customize parameter, add the configuration in Table 1, save it, and restart the service.
For MRS 3.2.0 and later versions, log in to FusionInsight Manager, choose Cluster > ClickHouse > Configurations > All Configurations, search for and change the value of merge_tree.parts_to_throw_insert, save the configurations, and restart the service.