Help Center/ MapReduce Service/ Getting Started/ Configuring Thresholds for Alarms

Updated on 2023-08-08 GMT+08:00

View PDF

Configuring Thresholds for Alarms

MRS clusters provide easy-to-use alarming functions with intuitive monitoring metric views. You can quickly view statistics on key performance metrics (KPIs) of a cluster and evaluate the cluster health status. MRS allows you to configure metric thresholds to stay informed of cluster health status. If a threshold value is met, the system generates and displays an alarm on the metric dashboard.

If it is verified that the impact of some alarms on services can be ignored or the alarm thresholds need to be adjusted, you can customize cluster metrics or mask some alarms as required.

You can set thresholds for alarms of node information metrics and cluster service metrics. For details about these metrics, their impacts on the system, and default thresholds, see Monitoring Metric Reference.

These alarms may affect cluster functions or job running. If you want to mask or modify alarm rules, evaluate operation risks in advance.

Modifying Rules for Alarms with Custom Thresholds

Log in to FusionInsight Manager of the target MRS cluster by referring to Accessing Log in the FusionInsight Manager (MRS 3.x or Later).
Choose O&M > Alarm > Thresholds.
Select a metric for a host or service in the cluster. For example, select Host Memory Usage.

Figure 1 Viewing an alarm threshold
- Switch: If this switch is turned on, an alarm will be triggered when the metric breaches this threshold.
- Trigger Count: Manager checks whether the metric meets the threshold value. If the number of consecutive checks where the metric fails equals the value of Trigger Count, an alarm is generated. The value can be customized. If an alarm is frequently reported, you can set Trigger Count to a bigger value to reduce the alarming frequency.
- Check Period (s): Interval between each two checks
- The rules to trigger alarms are listed on the page.

Modify an alarm rule.

Add a new rule.
1. Click Create Rule to add a rule that defines how an alarm will be triggered. For details, see Table 1.
2. Click OK to save the rule.
3. Locate the row that contains a rule that is in use, and click Cancel in the Operation column. If no rule is in use, skip this step.
4. Locate the row that contains the new rule, and click Apply in the Operation column. The value of Effective for this rule changes to Yes.
Modify an existing rule.
1. Click Modify in the Operation column of the row that contains the target rule.
2. Modify rule parameters by referring to Table 1.
3. Click OK.

The following table lists the rule parameters you need to set for triggering an alarm of Host Memory Usage.

**Table 1** Alarm rule parameters
Parameter	Description	Example Value
Rule Name	Rule name	mrs_test
Severity	Alarm severity. The options are as follows: Critical Major Minor Warning	Major
Threshold Type	Maximum or minimum value of a metric Max value: An alarm will be generated when the metric value is greater than this value. Min value: An alarm will be generated when the metric value is less than this value.	Max. Value
Date	How often the rule takes effect Daily Weekly Others	Daily
Add Date	Date when the rule takes effect. This parameter is available only when Date is set to Others. You can set multiple dates.	-
Thresholds	Start and End Time: Period when the rule takes effect.	00:00 - 23:59
Thresholds	Threshold: Alarm threshold value	85

Masking Specified Alarms

Log in to FusionInsight Manager of the target MRS cluster by referring to Accessing Log in the FusionInsight Manager (MRS 3.x or Later).
Choose O&M > Alarm > Masking.
In the list on the left of the displayed page, select the target service or module.
Click Mask in the Operation column of the alarm you want to mask. In the dialog box that is displayed, click OK to change the masking status of the alarm to Mask.

Figure 2 Masking an alarm
- You can search for specified alarms in the list.
- To cancel alarm masking, click Unmask in the row of the target alarm. In the dialog box that is displayed, click OK to change the alarm masking status to Display.
- If you need to perform operations on multiple alarms at a time, select the alarms and click Mask or Unmask on the top of the list.

FAQ

How Do I View Uncleared Alarms in a Cluster?
1. Log in to the MRS management console.
2. Click the name of the target cluster and click the Alarms tab.
3. Click Advanced Search, set Alarm Status to Uncleared, and click Search.
4. Uncleared alarms of the current cluster are displayed.
How Do I Clear a Cluster Alarm?
You can handle the alarms by referring to the alarm help. To view the help document, perform the following steps:
- Console: Log in to the MRS management console, click the name of the target cluster, click the Alarms tab, and click View Help in the Operation column of the alarm list. Then, clear the alarm by referring to the alarm handling procedure.
- Manager: Log in to FusionInsight Manager, choose O&M > Alarm > Alarms, and click View Help in the Operation column. Then, clear the alarm by referring to the alarm handling procedure.

Monitoring Metric Reference

FusionInsight Manager monitoring metrics are classified as node information metrics and cluster service metrics. Table 2 lists the metrics whose thresholds can be configured a node, and Table 3 lists metrics whose thresholds can be configured for a component.

**Table 2** Node monitoring metrics
Metric Group	Metric	ID	Alarm	Impact on System	Default Threshold
CPU	Host CPU Usage	12016	CPU Usage Exceeds the Threshold	Service processes respond slowly or become unavailable.	90.0%
Disk	Disk Usage	12017	Insufficient Disk Capacity	Service processes become unavailable.	90.0%
Disk	Disk Inode Usage	12051	Disk Inode Usage Exceeds the Threshold	Data cannot be properly written to the file system.	80.0%
Memory	Host Memory Usage	12018	Memory Usage Exceeds the Threshold	Service processes respond slowly or become unavailable.	90.0%
Host Status	Host File Handle Usage	12053	Host File Handle Usage Exceeds the Threshold	The I/O operations, such as opening a file or connecting to network, cannot be performed and programs are abnormal.	80.0%
Host Status	Host PID Usage	12027	Host PID Usage Exceeds the Threshold	No PID is available for new processes and service processes are unavailable.	90%
Network Status	TCP Temporary Port Usage	12052	TCP Temporary Port Usage Exceeds the Threshold	Services on the host fail to establish connections with the external and services are interrupted.	80.0%
Network Reading	Read Packet Error Rate	12047	Read Packet Error Rate Exceeds the Threshold	The communication is intermittently interrupted, and services time out.	0.5%
	Read Packet Dropped Rate	12045	Read Packet Dropped Rate Exceeds the Threshold	The service performance deteriorates or some services time out.	0.5%
	Read Throughput Rate	12049	Read Throughput Rate Exceeds the Threshold	The service system runs abnormally or is unavailable.	80%
Network Writing	Write Packet Error Rate	12048	Write Packet Error Rate Exceeds the Threshold	The communication is intermittently interrupted, and services time out.	0.5%
	Write Packet Dropped Rate	12046	Write Packet Dropped Rate Exceeds the Threshold	The service performance deteriorates or some services time out.	0.5%
	Write Throughput Rate	12050	Write Throughput Rate Exceeds the Threshold	The service system runs abnormally or is unavailable.	80%
Process	Total Number of Processes in D and Z States	12028	Number of Processes in the D State and Z State on a Host Exceeds the Threshold	Excessive system resources are used and service processes respond slowly.	0
Process	omm Process Usage	12061	Process Usage Exceeds the Threshold	Switch to user omm fails. New omm process cannot be created.	90

**Table 3** Cluster monitoring metrics
Service	Metric	ID	Alarm Name	Impact on System	Default Threshold
DBService	Usage of the Number of Database Connections	27005	Database Connection Usage Exceeds the Threshold	Upper-layer services may fail to connect to the DBService database, affecting services.	90%
DBService	Disk Space Usage of the Data Directory	27006	Disk Space Usage of the Data Directory Exceeds the Threshold	Service processes become unavailable. When the disk space usage of the data directory exceeds 90%, the database enters the read-only mode and Database Enters the Read-Only Mode is generated. As a result, service data is lost.	80%
Flume	Heap Memory Resource Percentage	24006	Heap Memory Usage of Flume Server Exceeds the Threshold	Heap memory overflow may cause service breakdown.	95.0%
	Direct Memory Usage Statistics	24007	Flume Server Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	80.0%
	Non-heap Memory Usage	24008	Flume Server Non-Heap Memory Usage Exceeds the Threshold	Non-heap memory overflow may cause service breakdown.	80.0%
	Total GC Duration	24009	Flume Server GC Duration Exceeds the Threshold	Flume data transmission efficiency decreases.	12000ms
HBase	GC Duration of Old Generation	19007	HBase GC Duration Exceeds the Threshold	If the old generation GC duration exceeds the threshold, HBase data read and write are affected.	5000ms
	RegionServer Direct Memory Usage Statistics	19009	Direct Memory Usage of the HBase Process Exceeds the Threshold	If the available HBase direct memory is insufficient, a memory overflow occurs and the service breaks down.	90%
	RegionServer Heap Memory Usage Statistics	19008	Heap Memory Usage of the HBase Process Exceeds the Threshold	If the available HBase memory is insufficient, a memory overflow occurs and the service breaks down.	90%
	HMaster Direct Memory Usage	19009	Direct Memory Usage of the HBase Process Exceeds the Threshold	If the available HBase direct memory is insufficient, a memory overflow occurs and the service breaks down.	90%
	HMaster Heap Memory Usage Statistics	19008	Heap Memory Usage of the HBase Process Exceeds the Threshold	If the available HBase memory is insufficient, a memory overflow occurs and the service breaks down.	90%
	Number of Online Regions of a RegionServer	19011	Number of RegionServer Regions Exceeds the Threshold	The data read/write performance of HBase is affected when the number of regions on a RegionServer exceeds the threshold.	2000
	Region in RIT State That Reaches the Threshold Duration	19013	Duration of Regions in RIT State Exceeds the Threshold	Some data in the table is lost or becomes unavailable.	1
	Handler Usage of RegionServer	19021	Number of Active Handlers of RegionServer Exceeds the Threshold	RegionServers or HBase cannot provide services properly.	90%
	Synchronization Failures in Disaster Recovery	19006	HBase Replication Sync Failed	HBase data in a cluster fails to be synchronized to the standby cluster, causing data inconsistency between active and standby clusters.	1
	Number of Log Files to Be Synchronized in the Active Cluster	19020	Number of HBase WAL Files to Be Synchronized Exceeds the Threshold	If the number of WAL files to be synchronized by a RegionServer exceeds the threshold, the number of ZNodes used by HBase exceeds the threshold, affecting the HBase service status.	128
	Number of HFiles to Be Synchronized in the Active Cluster	19019	Number of HFiles to Be Synchronized Exceeds the Threshold	If the number of HFiles to be synchronized by a RegionServer exceeds the threshold, the number of ZNodes used by HBase exceeds the threshold, affecting the HBase service status.	128
	Compaction Queue Size	19018	HBase Compaction Queue Size Exceeds the Threshold	The cluster performance may deteriorate, affecting data read and write.	100
HDFS	Lost Blocks	14003	Number of Lost HDFS Blocks Exceeds the Threshold	Data stored in HDFS is lost. HDFS may enter the security mode and cannot provide write services. Lost block data cannot be restored.	0
	Blocks Under Replicated	14028	Number of Blocks to Be Supplemented Exceeds the Threshold	Data stored in HDFS is lost. HDFS may enter the security mode and cannot provide write services. Lost block data cannot be restored.	1000
	Average Time of Active NameNode RPC Processing	14021	Average NameNode RPC Processing Time Exceeds the Threshold	NameNode cannot process the RPC requests from HDFS clients, upper-layer services that depend on HDFS, and DataNode in a timely manner. Specifically, the services that access HDFS run slowly or the HDFS service is unavailable.	100ms
	Average Time of Active NameNode RPC Queuing	14022	Average NameNode RPC Queuing Time Exceeds the Threshold	NameNode cannot process the RPC requests from HDFS clients, upper-layer services that depend on HDFS, and DataNode in a timely manner. Specifically, the services that access HDFS run slowly or the HDFS service is unavailable.	200ms
	HDFS Disk Usage	14001	HDFS Disk Usage Exceeds the Threshold	The performance of writing data to HDFS is affected.	80%
	DataNode Disk Usage	14002	DataNode Disk Usage Exceeds the Threshold	Insufficient disk space will impact data write to HDFS.	80%
	Percentage of Reserved Space for Replicas of Unused Space	14023	Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold	The performance of writing data to HDFS is affected. If all unused DataNode space is reserved for replicas, writing HDFS data fails.	90%
	Total Faulty DataNodes	14009	Number of Dead DataNodes Exceeds the Threshold	Faulty DataNodes cannot provide HDFS services.	3
	NameNode Non-Heap Memory Usage Statistics	14018	NameNode Non-Heap Memory Usage Exceeds the Threshold	If the non-heap memory usage of the HDFS NameNode is too high, data read/write performance of HDFS will be affected.	90%
	NameNode Direct Memory Usage Statistics	14017	NameNode Direct Memory Usage Exceeds the Threshold	If the available direct memory of NameNode instances is insufficient, a memory overflow may occur and the service breaks down.	90%
	NameNode Heap Memory Usage Statistics	14007	NameNode Heap Memory Usage Exceeds the Threshold	If the heap memory usage of the HDFS NameNode is too high, data read/write performance of HDFS will be affected.	95%
	DataNode Direct Memory Usage Statistics	14016	DataNode Direct Memory Usage Exceeds the Threshold	If the available direct memory of DataNode instances is insufficient, a memory overflow may occur and the service breaks down.	90%
	DataNode Heap Memory Usage Statistics	14008	DataNode Heap Memory Usage Exceeds the Threshold	The HDFS DataNode heap memory usage is too high, which affects the data read/write performance of the HDFS.	95%
	DataNode Non-Heap Memory Usage Statistics	14019	DataNode Non-Heap Memory Usage Exceeds the Threshold	If the non-heap memory usage of the HDFS DataNode is too high, data read/write performance of HDFS will be affected.	90%
	NameNode GC Duration Statistics	14014	NameNode GC Duration Exceeds the Threshold	A long GC duration of the NameNode process may interrupt the services.	12000ms
	DataNode GC Duration Statistics	14015	DataNode GC Duration Exceeds the Threshold	A long GC duration of the DataNode process may interrupt the services.	12000ms
Hive	Hive SQL Execution Success Rate (Percentage)	16002	Hive SQL Execution Success Rate Is Lower Than the Threshold	The system configuration and performance cannot meet service processing requirements.	90.0%
	Background Thread Usage	16003	Background Thread Usage Exceeds the Threshold	There are too many background threads, so the newly submitted task cannot run in time.	90%
	Total GC Duration of MetaStore	16007	Hive GC Duration Exceeds the Threshold	If the GC duration exceeds the threshold, Hive data read and write are affected.	12000ms
	Total GC Duration of HiveServer	16007	Hive GC Duration Exceeds the Threshold	If the GC duration exceeds the threshold, Hive data read and write are affected.	12000ms
	Percentage of HDFS Space Used by Hive to the Available Space	16001	Hive Warehouse Space Usage Exceeds the Threshold	The system fails to write data, which causes data loss.	85.0%
	MetaStore Direct Memory Usage Statistics	16006	Direct Memory Usage of the Hive Process Exceeds the Threshold	When the direct memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable.	95%
	MetaStore Non-Heap Memory Usage Statistics	16008	Non-heap Memory Usage of the Hive Service Exceeds the Threshold	When the non-heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable.	95%
	MetaStore Heap Memory Usage Statistics	16005	Heap Memory Usage of the Hive Process Exceeds the Threshold	When the heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable.	95%
	HiveServer Direct Memory Usage Statistics	16006	Direct Memory Usage of the Hive Process Exceeds the Threshold	When the direct memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable.	95%
	HiveServer Non-Heap Memory Usage Statistics	16008	Non-heap Memory Usage of the Hive Service Exceeds the Threshold	When the non-heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable.	95%
	HiveServer Heap Memory Usage Statistics	16005	Heap Memory Usage of the Hive Process Exceeds the Threshold	When the heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable.	95%
	Percentage of Sessions Connected to the HiveServer to Maximum Number of Sessions Allowed by the HiveServer	16000	Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds the Threshold	If a connection alarm is generated, too many sessions are connected to the HiveServer and new connections cannot be created.	90.0%
Kafka	Percentage of Partitions That Are Not Completely Synchronized	38006	Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold	Too many Kafka partitions that are not completely synchronized affect service reliability. In addition, data may be lost when leaders are switched.	50%
	User Connection Usage on Broker	38011	User Connection Usage on Broker Exceeds the Threshold	If the number of connections of a user is excessive, the user cannot create new connections to the Broker.	80%
	Broker Disk Usage	38001	Insufficient Kafka Disk Capacity	Kafka data write operations fail.	80.0%
	Disk I/O Rate of a Broker	38009	Busy Broker Disk I/Os	The disk partition has frequent I/Os. Data may fail to be written to the Kafka topic for which the alarm is generated.	80%
	Broker GC Duration per Minute	38005	GC Duration of the Broker Process Exceeds the Threshold	A long GC duration of the Broker process may interrupt the services.	12000ms
	Heap Memory Usage of Kafka	38002	Kafka Heap Memory Usage Exceeds the Threshold	If the available Kafka heap memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	Kafka Direct Memory Usage	38004	Kafka Direct Memory Usage Exceeds the Threshold	If the available direct memory of the Kafka service is insufficient, a memory overflow occurs and the service breaks down.	95%
Loader	Heap Memory Usage	23004	Loader Heap Memory Usage Exceeds the Threshold	Heap memory overflow may cause service breakdown.	95%
	Direct Memory Usage Statistics	23006	Loader Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	80.0%
	Non-heap Memory Usage	23005	Loader Non-Heap Memory Usage Exceeds the Threshold	Non-heap memory overflow may cause service breakdown.	80%
	Total GC Duration	23007	GC Duration of the Loader Process Exceeds the Threshold	Loader service response is slow.	12000ms
MapReduce	GC Duration Statistics	18012	JobHistoryServer GC Duration Exceeds the Threshold	A long GC duration of the JobHistoryServer process may interrupt the services.	12000ms
	JobHistoryServer Direct Memory Usage Statistics	18015	JobHistoryServer Direct Memory Usage Exceeds the Threshold	If the available direct memory of the MapReduce service is insufficient, a memory overflow occurs and the service breaks down.	90%
	JobHistoryServer Non-Heap Memory Usage Statistics	18019	Non-Heap Memory Usage of JobHistoryServer Exceeds the Threshold	When the non-heap memory usage of MapReduce JobHistoryServer is overhigh, the performance of MapReduce task submission and operation is affected. In addition, a memory overflow may occur so that the MapReduce service is unavailable.	90%
	JobHistoryServer Heap Memory Usage Statistics	18009	Heap Memory Usage of JobHistoryServer Exceeds the Threshold	When the heap memory usage of MapReduce JobHistoryServer is overhigh, the performance of MapReduce log archiving is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable.	95%
Oozie	Heap Memory Usage	17004	Oozie Heap Memory Usage Exceeds the Threshold	Heap memory overflow may cause service breakdown.	95.0%
	Direct Memory Usage	17006	Oozie Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	80.0%
	Non-heap Memory Usage	17005	Oozie Non-Heap Memory Usage Exceeds the Threshold	Non-heap memory overflow may cause service breakdown.	80%
	Total GC Duration	17007	GC Duration of the Oozie Process Exceeds the Threshold	Oozie responds slowly when it is used to submit tasks.	12000ms
Spark2x	JDBCServer2x Heap Memory Usage Statistics	43010	Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold	If available JDBCServe2x process heap memory is insufficient, a memory overflow occurs and the service breaks down	95%
	JDBCServer2x Direct Memory Usage Statistics	43012	Direct Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold	If the available JDBCServer2x Process direct heap memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	JDBCServer2x Non-Heap Memory Usage Statistics	43011	Non-Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold	If the available JDBCServer2x Process non-heap memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	JobHistory2x Direct Memory Usage Statistics	43008	Direct Memory Usage of the JobHistory2x Process Exceeds the Threshold	If the available JobHistory2x Process directmemory is insufficient, a memory overflow occurs and the service breaks down.	95%
	JobHistory2x Non-Heap Memory Usage Statistics	43007	Non-Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold	If the available JobHistory2x Process non-heap memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	JobHistory2x Heap Memory Usage Statistics	43006	Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold	If the available JobHistory2x Process heap memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	IndexServer2x Direct Memory Usage Statistics	43021	Direct Memory Usage of the IndexServer2x Process Exceeds the Threshold	If the available IndexServer2x process direct memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	IndexServer2x Heap Memory Usage Statistics	43019	Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold	If the available IndexServer2x process heap memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	IndexServer2x Non-Heap Memory Usage Statistics	43020	Non-Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold	If the available IndexServer2x process non-heap memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	Full GC Number of JDBCServer2x	43017	JDBCServer2x Process Full GC Number Exceeds the Threshold	The performance of the JDBCServer2x process is affected, or even the JDBCServer2x process is unavailable.	12
	Full GC Number of JobHistory2x	43018	JobHistory2x Process Full GC Number Exceeds the Threshold	The performance of the JobHistory2x process is affected, or even the JobHistory2x process is unavailable.	12
	Full GC Number of IndexServer2x	43023	IndexServer2x Process Full GC Number Exceeds the Threshold	If the GC number exceeds the threshold, IndexServer2x maybe run in low performance or even unavailable.	12
	Total GC Duration (in Milliseconds) of JDBCServer2x	43013	JDBCServer2x Process GC Duration Exceeds the Threshold	If the GC duration exceeds the threshold, JDBCServer2x maybe run in low performance.	12000ms
	Total GC Duration (in Milliseconds) of JobHistory2x	43009	JobHistory2x Process GC Duration Exceeds the Threshold	If the GC duration exceeds the threshold, JobHistory2x may run in low performance.	12000ms
	Total GC Duration (in Milliseconds) of IndexServer2x	43022	IndexServer2x Process GC Duration Exceeds the Threshold	If the GC duration exceeds the threshold, IndexServer2x may run in low performance or even unavailable.	12000ms
Storm	Number of Available Supervisors	26052	Number of Available Supervisors of the Storm Service Is Less Than the Threshold	Existing tasks in the cluster cannot be performed. The cluster can receive new Storm tasks, but cannot perform these tasks.	1
	Slot Usage	26053	Storm Slot Usage Exceeds the Threshold	New Storm tasks cannot be performed.	80.0%
	Nimbus Heap Memory Usage	26054	Nimbus Heap Memory Usage Exceeds the Threshold	When the heap memory usage of Storm Nimbus is overhigh, frequent GCs occur. In addition, a memory overflow may occur so that the Yarn service is unavailable.	80%
Yarn	NodeManager Direct Memory Usage Statistics	18014	NodeManager Direct Memory Usage Exceeds the Threshold	If the available direct memory of NodeManager is insufficient, a memory overflow occurs and the service breaks down.	90%
	NodeManager Heap Memory Usage Statistics	18018	NodeManager Heap Memory Usage Exceeds the Threshold	When the heap memory usage of Yarn NodeManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable.	95%
	NodeManager Non-Heap Memory Usage Statistics	18017	NodeManager Non-heap Memory Usage Exceeds the Threshold	When the heap memory usage of Yarn NodeManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable.	90%
	ResourceManager Direct Memory Usage Statistics	18013	ResourceManager Direct Memory Usage Exceeds the Threshold	If the available direct memory of ResourceManager is insufficient, a memory overflow occurs and the service breaks down.	90%
	ResourceManager Heap Memory Usage Statistics	18008	ResourceManager Heap Memory Usage Exceeds the Threshold	When the heap memory usage of Yarn ResourceManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable.	95%
	ResourceManager Non-Heap Memory Usage Statistics	18016	ResourceManager Non-Heap Memory Usage Exceeds the Threshold	When the non-heap memory usage of Yarn ResourceManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable.	90%
	NodeManager GC Duration Statistics	18011	NodeManager GC Duration Exceeds the Threshold	A long GC duration of the NodeManager process may interrupt the services.	12000ms
	ResourceManager GC Duration Statistics	18010	ResourceManager GC Duration Exceeds the Threshold	A long GC duration of the ResourceManager process may interrupt the services.	12000ms
	Number of Failed Tasks in the Root Queue	18026	Number of Failed Yarn Tasks Exceeds the Threshold	A large number of application tasks fail to be executed. Failed tasks need to be submitted again.	50
	Terminated Applications of the Root Queue	18025	Number of Terminated Yarn Tasks Exceeds the Threshold	A large number of application tasks are forcibly stopped.	50
	Pending Memory	18024	Pending Yarn Memory Usage Exceeds the Threshold	It takes long time to end an application. A new application cannot run after submission.	83886080MB
	Pending Tasks	18023	Number of Pending Yarn Tasks Exceeds the Threshold	It takes long time to end an application. A new application cannot run for a long time after submission.	60
ZooKeeper	ZooKeeper Connections Usage	13001	Available ZooKeeper Connections Are Insufficient	Available ZooKeeper connections are insufficient. When the connection usage reaches 100%, external connections cannot be handled.	80%
	ZooKeeper Heap Memory Usage	13004	ZooKeeper Heap Memory Usage Exceeds the Threshold	If the available ZooKeeper memory is insufficient, a memory overflow occurs and the service breaks down.	95%
	ZooKeeper Direct Memory Usage	13002	ZooKeeper Direct Memory Usage Exceeds the Threshold	If the available ZooKeeper memory is insufficient, a memory overflow occurs and the service breaks down.	80%
	ZooKeeper GC Duration per Minute	13003	GC Duration of the ZooKeeper Process Exceeds the Threshold	A long GC duration of the ZooKeeper process may interrupt the services.	12000ms
Ranger	UserSync GC Duration	45284	UserSync GC Duration Exceeds the Threshold	UserSync responds slowly.	12000ms
	PolicySync GC Duration	45292	PolicySync GC Duration Exceeds the Threshold	PolicySync responds slowly.	12000ms
	RangerAdmin GC Duration	45280	RangerAdmin GC Duration Exceeds the Threshold	RangerAdmin responds slowly.	12000ms
	TagSync GC Duration	45288	TagSync GC Duration Exceeds the Threshold	TagSync responds slowly.	12000ms
	UserSync Non-Heap Memory Usage	45283	UserSync Non-Heap Memory Usage Exceeds the Threshold	Non-heap memory overflow may cause service breakdown.	80.0%
	UserSync Direct Memory Usage	45282	UserSync Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	80.0%
	UserSync Heap Memory Usage	45281	UserSync Heap Memory Usage Exceeds the Threshold	Heap memory overflow may cause service breakdown.	95.0%
	PolicySync Direct Memory Usage	45290	PolicySync Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	80.0%
	PolicySync Heap Memory Usage	45289	PolicySync Heap Memory Usage Exceeds the Threshold	Heap memory overflow may cause service breakdown.	95.0%
	PolicySync Non-Heap Memory Usage	45291	PolicySync Non-Heap Memory Usage Exceeds the Threshold	Non-heap memory overflow may cause service breakdown.	80.0%
	RangerAdmin Non-Heap Memory Usage	45279	RangerAdmin Non-Heap Memory Usage Exceeds the Threshold	Non-heap memory overflow may cause service breakdown.	80.0%
	RangerAdmin Heap Memory Usage	45277	RangerAdmin Heap Memory Usage Exceeds the Threshold	Heap memory overflow may cause service breakdown.	95.0%
	RangerAdmin Direct Memory Usage	45278	RangerAdmin Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	80.0%
	TagSync Direct Memory Usage	45286	TagSync Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	80.0%
	TagSync Non-Heap Memory Usage	45287	TagSync Non-Heap Memory Usage Exceeds the Threshold	Non-heap memory overflow may cause service breakdown.	80.0%
	TagSync Heap Memory Usage	45285	TagSync Heap Memory Usage Exceeds the Threshold	Heap memory overflow may cause service breakdown.	95.0%
ClickHouse	Clickhouse Service Quantity Quota Usage in ZooKeeper	45426	ClickHouse Service Quantity Quota Usage in ZooKeeper Exceeds the Threshold	After the ZooKeeper quantity quota of the ClickHouse service exceeds the threshold, you cannot perform cluster operations on the ClickHouse service on FusionInsight Manager. As a result, the ClickHouse service cannot be used.	90%
ClickHouse	ClickHouse Service Capacity Quota Usage in ZooKeeper	45427	ClickHouse Service Capacity Quota Usage in ZooKeeper Exceeds the Threshold	After the ZooKeeper capacity quota of the ClickHouse service exceeds the threshold, you cannot perform cluster operations on the ClickHouse service on FusionInsight Manager. As a result, the ClickHouse service cannot be used.	90%
IoTDB	Maximum Merge (Intra-Space Merge) Latency	45594	IoTDBServer Intra-Space Merge Duration Exceeds the Threshold	Data write is blocked and the write operation performance is affected.	300000ms
	Maximum Merge (Flush) Latency	45593	IoTDBServer Flush Execution Duration Exceeds the Threshold	Data write is blocked and the write operation performance is affected.	300000ms
	Maximum Merge (Cross-Space Merge) Latency	45595	IoTDBServer Cross-Space Merge Duration Exceeds the Threshold	Data write is blocked and the write operation performance is affected.	300000ms
	Maximum RPC (executeStatement) Latency	45592	IoTDBServer RPC Execution Duration Exceeds the Threshold	Running performance of the IoTDBServer process is affected.	10000s
	Total GC Duration of IoTDBServer	45587	IoTDBServer GC Duration Exceeds the Threshold	A long GC duration of the IoTDBServer process may interrupt the services.	12000ms
	Total GC Duration of ConfigNode	45590	ConfigNode GC Duration Exceeds the Threshold	A long GC duration of the ConfigNode process may interrupt services.	12000ms
	IoTDBServer Heap Memory Usage	45586	IoTDBServer Heap Memory Usage Exceeds the Threshold	If the available IoTDBServer process heap memory is insufficient, a memory overflow occurs and the service breaks down.	90%
	IoTDBServer Direct Memory Usage	45588	IoTDBServer Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause service breakdown.	90%
	ConfigNode Heap Memory Usage	45589	ConfigNode Heap Memory Usage Exceeds the Threshold	If the available ConfigNode process heap memory is insufficient, a memory overflow occurs and the service breaks down.	90%
	ConfigNode Direct Memory Usage	45591	ConfigNode Direct Memory Usage Exceeds the Threshold	Direct memory overflow may cause the IoTDB instance to be unavailable.	90%