Configuring Thresholds for Alarms
MRS clusters provide easy-to-use alarming functions with intuitive monitoring metric views. You can quickly view statistics on key performance metrics (KPIs) of a cluster and evaluate the cluster health status. MRS allows you to configure metric thresholds to stay informed of cluster health status. If a threshold value is met, the system generates and displays an alarm on the metric dashboard.
If it is verified that the impact of some alarms on services can be ignored or the alarm thresholds need to be adjusted, you can customize cluster metrics or mask some alarms as required.
You can set thresholds for alarms of node information metrics and cluster service metrics. For details about these metrics, their impacts on the system, and default thresholds, see Monitoring Metric Reference.
These alarms may affect cluster functions or job running. If you want to mask or modify alarm rules, evaluate operation risks in advance.
Modifying Rules for Alarms with Custom Thresholds
- Log in to FusionInsight Manager of the target MRS cluster by referring to Accessing Log in the FusionInsight Manager (MRS 3.x or Later).
- Choose O&M > Alarm > Thresholds.
- Select a metric for a host or service in the cluster. For example, select Host Memory Usage.
Figure 1 Viewing an alarm threshold
- Switch: If this switch is turned on, an alarm will be triggered when the metric breaches this threshold.
- Trigger Count: Manager checks whether the metric meets the threshold value. If the number of consecutive checks where the metric fails equals the value of Trigger Count, an alarm is generated. The value can be customized. If an alarm is frequently reported, you can set Trigger Count to a larger value to reduce the alarming frequency.
- Check Period (s): Interval between each two checks
- The rules to trigger alarms are listed on the page.
- Modify an alarm rule.
- Add a new rule.
- Click Create Rule to add a rule that defines how an alarm will be triggered. For details, see Table 1.
- Click OK to save the rule.
- Locate the row that contains a rule that is in use, and click Cancel in the Operation column. If no rule is in use, skip this step.
- Locate the row that contains the new rule, and click Apply in the Operation column. The value of Effective for this rule changes to Yes.
- Modify an existing rule.
- Click Modify in the Operation column of the row that contains the target rule.
- Modify rule parameters by referring to Table 1.
- Click OK.
The following table lists the rule parameters you need to set for triggering an alarm of Host Memory Usage.
Table 1 Alarm rule parameters Parameter
Description
Example Value
Rule Name
Rule name
mrs_test
Severity
Alarm severity. The options are as follows:
- Critical
- Major
- Minor
- Warning
Major
Threshold Type
Maximum or minimum value of a metric
- Max value: An alarm will be generated when the metric value is greater than this value.
- Min value: An alarm will be generated when the metric value is less than this value.
Max. Value
Date
How often the rule takes effect
- Daily
- Weekly
- Others
Daily
Add Date
Date when the rule takes effect. This parameter is available only when Date is set to Others. You can set multiple dates.
-
Thresholds
Start and End Time: Period when the rule takes effect.
00:00 - 23:59
Threshold: Alarm threshold value
85
- Add a new rule.
Masking Specified Alarms
- Log in to FusionInsight Manager of the target MRS cluster by referring to Accessing Log in the FusionInsight Manager (MRS 3.x or Later).
- Choose O&M > Alarm > Masking.
- In the list on the left of the displayed page, select the target service or module.
- Click Mask in the Operation column of the alarm you want to mask. In the dialog box that is displayed, click OK to change the masking status of the alarm to Mask.
Figure 2 Masking an alarm
- You can search for specified alarms in the list.
- To cancel alarm masking, click Unmask in the row of the target alarm. In the dialog box that is displayed, click OK to change the alarm masking status to Display.
- If you need to perform operations on multiple alarms at a time, select the alarms and click Mask or Unmask on the top of the list.
FAQ
- How Do I View Uncleared Alarms in a Cluster?
- Log in to the MRS management console.
- Click the name of the target cluster and click the Alarms tab.
- Click Advanced Search, set Alarm Status to Uncleared, and click Search.
- Uncleared alarms of the current cluster are displayed.
- How Do I Clear a Cluster Alarm?
You can handle the alarms by referring to the alarm help. To view the help document, perform the following steps:
- Console: Log in to the MRS management console, click the name of the target cluster, click the Alarms tab, and click View Help in the Operation column of the alarm list. Then, clear the alarm by referring to the alarm handling procedure.
- Manager: Log in to FusionInsight Manager, choose O&M > Alarm > Alarms, and click View Help in the Operation column. Then, clear the alarm by referring to the alarm handling procedure.
Monitoring Metric Reference
FusionInsight Manager monitoring metrics are classified as node information metrics and cluster service metrics. Table 2 lists the metrics whose thresholds can be configured a node, and Table 3 lists metrics whose thresholds can be configured for a component.
Metric Group |
Metric |
ID |
Alarm |
Impact on System |
Default Threshold |
---|---|---|---|---|---|
CPU |
Host CPU Usage |
12016 |
CPU Usage Exceeds the Threshold |
Service processes respond slowly or become unavailable. |
90.0% |
Disk |
Disk Usage |
12017 |
Insufficient Disk Capacity |
Service processes become unavailable. |
90.0% |
Disk Inode Usage |
12051 |
Disk Inode Usage Exceeds the Threshold |
Data cannot be properly written to the file system. |
80.0% |
|
Memory |
Host Memory Usage |
12018 |
Memory Usage Exceeds the Threshold |
Service processes respond slowly or become unavailable. |
90.0% |
Host Status |
Host File Handle Usage |
12053 |
Host File Handle Usage Exceeds the Threshold |
The I/O operations, such as opening a file or connecting to network, cannot be performed and programs are abnormal. |
80.0% |
Host PID Usage |
12027 |
Host PID Usage Exceeds the Threshold |
No PID is available for new processes and service processes are unavailable. |
90% |
|
Network Status |
TCP Temporary Port Usage |
12052 |
TCP Temporary Port Usage Exceeds the Threshold |
Services on the host fail to establish connections with the external and services are interrupted. |
80.0% |
Network Reading |
Read Packet Error Rate |
12047 |
Read Packet Error Rate Exceeds the Threshold |
The communication is intermittently interrupted, and services time out. |
0.5% |
Read Packet Dropped Rate |
12045 |
Read Packet Dropped Rate Exceeds the Threshold |
The service performance deteriorates or some services time out. |
0.5% |
|
Read Throughput Rate |
12049 |
Read Throughput Rate Exceeds the Threshold |
The service system runs abnormally or is unavailable. |
80% |
|
Network Writing |
Write Packet Error Rate |
12048 |
Write Packet Error Rate Exceeds the Threshold |
The communication is intermittently interrupted, and services time out. |
0.5% |
Write Packet Dropped Rate |
12046 |
Write Packet Dropped Rate Exceeds the Threshold |
The service performance deteriorates or some services time out. |
0.5% |
|
Write Throughput Rate |
12050 |
Write Throughput Rate Exceeds the Threshold |
The service system runs abnormally or is unavailable. |
80% |
|
Process |
Total Number of Processes in D and Z States |
12028 |
Number of Processes in the D State and Z State on a Host Exceeds the Threshold |
Excessive system resources are used and service processes respond slowly. |
0 |
omm Process Usage |
12061 |
Process Usage Exceeds the Threshold |
Switch to user omm fails. New omm process cannot be created. |
90 |
Service |
Metric |
ID |
Alarm Name |
Impact on System |
Default Threshold |
---|---|---|---|---|---|
DBService |
Usage of the Number of Database Connections |
27005 |
Database Connection Usage Exceeds the Threshold |
Upper-layer services may fail to connect to the DBService database, affecting services. |
90% |
Disk Space Usage of the Data Directory |
27006 |
Disk Space Usage of the Data Directory Exceeds the Threshold |
Service processes become unavailable. When the disk space usage of the data directory exceeds 90%, the database enters the read-only mode and Database Enters the Read-Only Mode is generated. As a result, service data is lost. |
80% |
|
Flume |
Heap Memory Resource Percentage |
24006 |
Heap Memory Usage of Flume Server Exceeds the Threshold |
Heap memory overflow may cause service breakdown. |
95.0% |
Direct Memory Usage Statistics |
24007 |
Flume Server Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
80.0% |
|
Non-heap Memory Usage |
24008 |
Flume Server Non-Heap Memory Usage Exceeds the Threshold |
Non-heap memory overflow may cause service breakdown. |
80.0% |
|
Total GC Duration |
24009 |
Flume Server GC Duration Exceeds the Threshold |
Flume data transmission efficiency decreases. |
12000ms |
|
HBase |
GC Duration of Old Generation |
19007 |
HBase GC Duration Exceeds the Threshold |
If the old generation GC duration exceeds the threshold, HBase data read and write are affected. |
5000ms |
RegionServer Direct Memory Usage Statistics |
19009 |
Direct Memory Usage of the HBase Process Exceeds the Threshold |
If the available HBase direct memory is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
RegionServer Heap Memory Usage Statistics |
19008 |
Heap Memory Usage of the HBase Process Exceeds the Threshold |
If the available HBase memory is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
HMaster Direct Memory Usage |
19009 |
Direct Memory Usage of the HBase Process Exceeds the Threshold |
If the available HBase direct memory is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
HMaster Heap Memory Usage Statistics |
19008 |
Heap Memory Usage of the HBase Process Exceeds the Threshold |
If the available HBase memory is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
Number of Online Regions of a RegionServer |
19011 |
Number of RegionServer Regions Exceeds the Threshold |
The data read/write performance of HBase is affected when the number of regions on a RegionServer exceeds the threshold. |
2000 |
|
Region in RIT State That Reaches the Threshold Duration |
19013 |
Duration of Regions in RIT State Exceeds the Threshold |
Some data in the table is lost or becomes unavailable. |
1 |
|
Handler Usage of RegionServer |
19021 |
Number of Active Handlers of RegionServer Exceeds the Threshold |
RegionServers or HBase cannot provide services properly. |
90% |
|
Synchronization Failures in Disaster Recovery |
19006 |
HBase Replication Sync Failed |
HBase data in a cluster fails to be synchronized to the standby cluster, causing data inconsistency between active and standby clusters. |
1 |
|
Number of Log Files to Be Synchronized in the Active Cluster |
19020 |
Number of HBase WAL Files to Be Synchronized Exceeds the Threshold |
If the number of WAL files to be synchronized by a RegionServer exceeds the threshold, the number of ZNodes used by HBase exceeds the threshold, affecting the HBase service status. |
128 |
|
Number of HFiles to Be Synchronized in the Active Cluster |
19019 |
Number of HFiles to Be Synchronized Exceeds the Threshold |
If the number of HFiles to be synchronized by a RegionServer exceeds the threshold, the number of ZNodes used by HBase exceeds the threshold, affecting the HBase service status. |
128 |
|
Compaction Queue Size |
19018 |
HBase Compaction Queue Size Exceeds the Threshold |
The cluster performance may deteriorate, affecting data read and write. |
100 |
|
HDFS |
Lost Blocks |
14003 |
Number of Lost HDFS Blocks Exceeds the Threshold |
Data stored in HDFS is lost. HDFS may enter the security mode and cannot provide write services. Lost block data cannot be restored. |
0 |
Blocks Under Replicated |
14028 |
Number of Blocks to Be Supplemented Exceeds the Threshold |
Data stored in HDFS is lost. HDFS may enter the security mode and cannot provide write services. Lost block data cannot be restored. |
1000 |
|
Average Time of Active NameNode RPC Processing |
14021 |
Average NameNode RPC Processing Time Exceeds the Threshold |
NameNode cannot process the RPC requests from HDFS clients, upper-layer services that depend on HDFS, and DataNode in a timely manner. Specifically, the services that access HDFS run slowly or the HDFS service is unavailable. |
100ms |
|
Average Time of Active NameNode RPC Queuing |
14022 |
Average NameNode RPC Queuing Time Exceeds the Threshold |
NameNode cannot process the RPC requests from HDFS clients, upper-layer services that depend on HDFS, and DataNode in a timely manner. Specifically, the services that access HDFS run slowly or the HDFS service is unavailable. |
200ms |
|
HDFS Disk Usage |
14001 |
HDFS Disk Usage Exceeds the Threshold |
The performance of writing data to HDFS is affected. |
80% |
|
DataNode Disk Usage |
14002 |
DataNode Disk Usage Exceeds the Threshold |
Insufficient disk space will impact data write to HDFS. |
80% |
|
Percentage of Reserved Space for Replicas of Unused Space |
14023 |
Percentage of Total Reserved Disk Space for Replicas Exceeds the Threshold |
The performance of writing data to HDFS is affected. If all unused DataNode space is reserved for replicas, writing HDFS data fails. |
90% |
|
Total Faulty DataNodes |
14009 |
Number of Dead DataNodes Exceeds the Threshold |
Faulty DataNodes cannot provide HDFS services. |
3 |
|
NameNode Non-Heap Memory Usage Statistics |
14018 |
NameNode Non-Heap Memory Usage Exceeds the Threshold |
If the non-heap memory usage of the HDFS NameNode is too high, data read/write performance of HDFS will be affected. |
90% |
|
NameNode Direct Memory Usage Statistics |
14017 |
NameNode Direct Memory Usage Exceeds the Threshold |
If the available direct memory of NameNode instances is insufficient, a memory overflow may occur and the service breaks down. |
90% |
|
NameNode Heap Memory Usage Statistics |
14007 |
NameNode Heap Memory Usage Exceeds the Threshold |
If the heap memory usage of the HDFS NameNode is too high, data read/write performance of HDFS will be affected. |
95% |
|
DataNode Direct Memory Usage Statistics |
14016 |
DataNode Direct Memory Usage Exceeds the Threshold |
If the available direct memory of DataNode instances is insufficient, a memory overflow may occur and the service breaks down. |
90% |
|
DataNode Heap Memory Usage Statistics |
14008 |
DataNode Heap Memory Usage Exceeds the Threshold |
The HDFS DataNode heap memory usage is too high, which affects the data read/write performance of the HDFS. |
95% |
|
DataNode Non-Heap Memory Usage Statistics |
14019 |
DataNode Non-Heap Memory Usage Exceeds the Threshold |
If the non-heap memory usage of the HDFS DataNode is too high, data read/write performance of HDFS will be affected. |
90% |
|
NameNode GC Duration Statistics |
14014 |
NameNode GC Duration Exceeds the Threshold |
A long GC duration of the NameNode process may interrupt the services. |
12000ms |
|
DataNode GC Duration Statistics |
14015 |
DataNode GC Duration Exceeds the Threshold |
A long GC duration of the DataNode process may interrupt the services. |
12000ms |
|
Hive |
Hive SQL Execution Success Rate (Percentage) |
16002 |
Hive SQL Execution Success Rate Is Lower Than the Threshold |
The system configuration and performance cannot meet service processing requirements. |
90.0% |
Background Thread Usage |
16003 |
Background Thread Usage Exceeds the Threshold |
There are too many background threads, so the newly submitted task cannot run in time. |
90% |
|
Total GC Duration of MetaStore |
16007 |
Hive GC Duration Exceeds the Threshold |
If the GC duration exceeds the threshold, Hive data read and write are affected. |
12000ms |
|
Total GC Duration of HiveServer |
16007 |
Hive GC Duration Exceeds the Threshold |
If the GC duration exceeds the threshold, Hive data read and write are affected. |
12000ms |
|
Percentage of HDFS Space Used by Hive to the Available Space |
16001 |
Hive Warehouse Space Usage Exceeds the Threshold |
The system fails to write data, which causes data loss. |
85.0% |
|
MetaStore Direct Memory Usage Statistics |
16006 |
Direct Memory Usage of the Hive Process Exceeds the Threshold |
When the direct memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable. |
95% |
|
MetaStore Non-Heap Memory Usage Statistics |
16008 |
Non-heap Memory Usage of the Hive Service Exceeds the Threshold |
When the non-heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable. |
95% |
|
MetaStore Heap Memory Usage Statistics |
16005 |
Heap Memory Usage of the Hive Process Exceeds the Threshold |
When the heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable. |
95% |
|
HiveServer Direct Memory Usage Statistics |
16006 |
Direct Memory Usage of the Hive Process Exceeds the Threshold |
When the direct memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable. |
95% |
|
HiveServer Non-Heap Memory Usage Statistics |
16008 |
Non-heap Memory Usage of the Hive Service Exceeds the Threshold |
When the non-heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable. |
95% |
|
HiveServer Heap Memory Usage Statistics |
16005 |
Heap Memory Usage of the Hive Process Exceeds the Threshold |
When the heap memory usage of Hive is overhigh, the performance of Hive task operation is affected. In addition, a memory overflow may occur so that the Hive service is unavailable. |
95% |
|
Percentage of Sessions Connected to the HiveServer to Maximum Number of Sessions Allowed by the HiveServer |
16000 |
Percentage of Sessions Connected to the HiveServer to Maximum Number Allowed Exceeds the Threshold |
If a connection alarm is generated, too many sessions are connected to the HiveServer and new connections cannot be created. |
90.0% |
|
Kafka |
Percentage of Partitions That Are Not Completely Synchronized |
38006 |
Percentage of Kafka Partitions That Are Not Completely Synchronized Exceeds the Threshold |
Too many Kafka partitions that are not completely synchronized affect service reliability. In addition, data may be lost when leaders are switched. |
50% |
User Connection Usage on Broker |
38011 |
User Connection Usage on Broker Exceeds the Threshold |
If the number of connections of a user is excessive, the user cannot create new connections to the Broker. |
80% |
|
Broker Disk Usage |
38001 |
Insufficient Kafka Disk Capacity |
Kafka data write operations fail. |
80.0% |
|
Disk I/O Rate of a Broker |
38009 |
Busy Broker Disk I/Os |
The disk partition has frequent I/Os. Data may fail to be written to the Kafka topic for which the alarm is generated. |
80% |
|
Broker GC Duration per Minute |
38005 |
GC Duration of the Broker Process Exceeds the Threshold |
A long GC duration of the Broker process may interrupt the services. |
12000ms |
|
Heap Memory Usage of Kafka |
38002 |
Kafka Heap Memory Usage Exceeds the Threshold |
If the available Kafka heap memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
Kafka Direct Memory Usage |
38004 |
Kafka Direct Memory Usage Exceeds the Threshold |
If the available direct memory of the Kafka service is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
Loader |
Heap Memory Usage |
23004 |
Loader Heap Memory Usage Exceeds the Threshold |
Heap memory overflow may cause service breakdown. |
95% |
Direct Memory Usage Statistics |
23006 |
Loader Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
80.0% |
|
Non-heap Memory Usage |
23005 |
Loader Non-Heap Memory Usage Exceeds the Threshold |
Non-heap memory overflow may cause service breakdown. |
80% |
|
Total GC Duration |
23007 |
GC Duration of the Loader Process Exceeds the Threshold |
Loader service response is slow. |
12000ms |
|
MapReduce |
GC Duration Statistics |
18012 |
JobHistoryServer GC Duration Exceeds the Threshold |
A long GC duration of the JobHistoryServer process may interrupt the services. |
12000ms |
JobHistoryServer Direct Memory Usage Statistics |
18015 |
JobHistoryServer Direct Memory Usage Exceeds the Threshold |
If the available direct memory of the MapReduce service is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
JobHistoryServer Non-Heap Memory Usage Statistics |
18019 |
Non-Heap Memory Usage of JobHistoryServer Exceeds the Threshold |
When the non-heap memory usage of MapReduce JobHistoryServer is overhigh, the performance of MapReduce task submission and operation is affected. In addition, a memory overflow may occur so that the MapReduce service is unavailable. |
90% |
|
JobHistoryServer Heap Memory Usage Statistics |
18009 |
Heap Memory Usage of JobHistoryServer Exceeds the Threshold |
When the heap memory usage of MapReduce JobHistoryServer is overhigh, the performance of MapReduce log archiving is affected. In addition, a memory overflow may occur, leading to unavailable YARN service. |
95% |
|
Oozie |
Heap Memory Usage |
17004 |
Oozie Heap Memory Usage Exceeds the Threshold |
Heap memory overflow may cause service breakdown. |
95.0% |
Direct Memory Usage |
17006 |
Oozie Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
80.0% |
|
Non-heap Memory Usage |
17005 |
Oozie Non-Heap Memory Usage Exceeds the Threshold |
Non-heap memory overflow may cause service breakdown. |
80% |
|
Total GC Duration |
17007 |
GC Duration of the Oozie Process Exceeds the Threshold |
Oozie responds slowly when it is used to submit tasks. |
12000ms |
|
Spark2x |
JDBCServer2x Heap Memory Usage Statistics |
43010 |
Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold |
If available JDBCServe2x process heap memory is insufficient, a memory overflow occurs and the service breaks down |
95% |
JDBCServer2x Direct Memory Usage Statistics |
43012 |
Direct Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold |
If the available JDBCServer2x Process direct heap memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
JDBCServer2x Non-Heap Memory Usage Statistics |
43011 |
Non-Heap Memory Usage of the JDBCServer2x Process Exceeds the Threshold |
If the available JDBCServer2x Process non-heap memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
JobHistory2x Direct Memory Usage Statistics |
43008 |
Direct Memory Usage of the JobHistory2x Process Exceeds the Threshold |
If the available JobHistory2x Process directmemory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
JobHistory2x Non-Heap Memory Usage Statistics |
43007 |
Non-Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold |
If the available JobHistory2x Process non-heap memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
JobHistory2x Heap Memory Usage Statistics |
43006 |
Heap Memory Usage of the JobHistory2x Process Exceeds the Threshold |
If the available JobHistory2x Process heap memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
IndexServer2x Direct Memory Usage Statistics |
43021 |
Direct Memory Usage of the IndexServer2x Process Exceeds the Threshold |
If the available IndexServer2x process direct memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
IndexServer2x Heap Memory Usage Statistics |
43019 |
Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold |
If the available IndexServer2x process heap memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
IndexServer2x Non-Heap Memory Usage Statistics |
43020 |
Non-Heap Memory Usage of the IndexServer2x Process Exceeds the Threshold |
If the available IndexServer2x process non-heap memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
Full GC Number of JDBCServer2x |
43017 |
JDBCServer2x Process Full GC Number Exceeds the Threshold |
The performance of the JDBCServer2x process is affected, or even the JDBCServer2x process is unavailable. |
12 |
|
Full GC Number of JobHistory2x |
43018 |
JobHistory2x Process Full GC Number Exceeds the Threshold |
The performance of the JobHistory2x process is affected, or even the JobHistory2x process is unavailable. |
12 |
|
Full GC Number of IndexServer2x |
43023 |
IndexServer2x Process Full GC Number Exceeds the Threshold |
If the GC number exceeds the threshold, IndexServer2x maybe run in low performance or even unavailable. |
12 |
|
Total GC Duration (in Milliseconds) of JDBCServer2x |
43013 |
JDBCServer2x Process GC Duration Exceeds the Threshold |
If the GC duration exceeds the threshold, JDBCServer2x maybe run in low performance. |
12000ms |
|
Total GC Duration (in Milliseconds) of JobHistory2x |
43009 |
JobHistory2x Process GC Duration Exceeds the Threshold |
If the GC duration exceeds the threshold, JobHistory2x may run in low performance. |
12000ms |
|
Total GC Duration (in Milliseconds) of IndexServer2x |
43022 |
IndexServer2x Process GC Duration Exceeds the Threshold |
If the GC duration exceeds the threshold, IndexServer2x may run in low performance or even unavailable. |
12000ms |
|
Storm |
Number of Available Supervisors |
26052 |
Number of Available Supervisors of the Storm Service Is Less Than the Threshold |
Existing tasks in the cluster cannot be performed. The cluster can receive new Storm tasks, but cannot perform these tasks. |
1 |
Slot Usage |
26053 |
Storm Slot Usage Exceeds the Threshold |
New Storm tasks cannot be performed. |
80.0% |
|
Nimbus Heap Memory Usage |
26054 |
Nimbus Heap Memory Usage Exceeds the Threshold |
When the heap memory usage of Storm Nimbus is overhigh, frequent GCs occur. In addition, a memory overflow may occur so that the Yarn service is unavailable. |
80% |
|
Yarn |
NodeManager Direct Memory Usage Statistics |
18014 |
NodeManager Direct Memory Usage Exceeds the Threshold |
If the available direct memory of NodeManager is insufficient, a memory overflow occurs and the service breaks down. |
90% |
NodeManager Heap Memory Usage Statistics |
18018 |
NodeManager Heap Memory Usage Exceeds the Threshold |
When the heap memory usage of Yarn NodeManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable. |
95% |
|
NodeManager Non-Heap Memory Usage Statistics |
18017 |
NodeManager Non-heap Memory Usage Exceeds the Threshold |
When the heap memory usage of Yarn NodeManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable. |
90% |
|
ResourceManager Direct Memory Usage Statistics |
18013 |
ResourceManager Direct Memory Usage Exceeds the Threshold |
If the available direct memory of ResourceManager is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
ResourceManager Heap Memory Usage Statistics |
18008 |
ResourceManager Heap Memory Usage Exceeds the Threshold |
When the heap memory usage of Yarn ResourceManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable. |
95% |
|
ResourceManager Non-Heap Memory Usage Statistics |
18016 |
ResourceManager Non-Heap Memory Usage Exceeds the Threshold |
When the non-heap memory usage of Yarn ResourceManager is overhigh, the performance of Yarn task submission and operation is affected. In addition, a memory overflow may occur so that the Yarn service is unavailable. |
90% |
|
NodeManager GC Duration Statistics |
18011 |
NodeManager GC Duration Exceeds the Threshold |
A long GC duration of the NodeManager process may interrupt the services. |
12000ms |
|
ResourceManager GC Duration Statistics |
18010 |
ResourceManager GC Duration Exceeds the Threshold |
A long GC duration of the ResourceManager process may interrupt the services. |
12000ms |
|
Number of Failed Tasks in the Root Queue |
18026 |
Number of Failed Yarn Tasks Exceeds the Threshold |
A large number of application tasks fail to be executed. Failed tasks need to be submitted again. |
50 |
|
Terminated Applications of the Root Queue |
18025 |
Number of Terminated Yarn Tasks Exceeds the Threshold |
A large number of application tasks are forcibly stopped. |
50 |
|
Pending Memory |
18024 |
Pending Yarn Memory Usage Exceeds the Threshold |
It takes long time to end an application. A new application cannot run after submission. |
83886080MB |
|
Pending Tasks |
18023 |
Number of Pending Yarn Tasks Exceeds the Threshold |
It takes long time to end an application. A new application cannot run for a long time after submission. |
60 |
|
ZooKeeper |
ZooKeeper Connections Usage |
13001 |
Available ZooKeeper Connections Are Insufficient |
Available ZooKeeper connections are insufficient. When the connection usage reaches 100%, external connections cannot be handled. |
80% |
ZooKeeper Heap Memory Usage |
13004 |
ZooKeeper Heap Memory Usage Exceeds the Threshold |
If the available ZooKeeper memory is insufficient, a memory overflow occurs and the service breaks down. |
95% |
|
ZooKeeper Direct Memory Usage |
13002 |
ZooKeeper Direct Memory Usage Exceeds the Threshold |
If the available ZooKeeper memory is insufficient, a memory overflow occurs and the service breaks down. |
80% |
|
ZooKeeper GC Duration per Minute |
13003 |
GC Duration of the ZooKeeper Process Exceeds the Threshold |
A long GC duration of the ZooKeeper process may interrupt the services. |
12000ms |
|
Ranger |
UserSync GC Duration |
45284 |
UserSync GC Duration Exceeds the Threshold |
UserSync responds slowly. |
12000ms |
PolicySync GC Duration |
45292 |
PolicySync GC Duration Exceeds the Threshold |
PolicySync responds slowly. |
12000ms |
|
RangerAdmin GC Duration |
45280 |
RangerAdmin GC Duration Exceeds the Threshold |
RangerAdmin responds slowly. |
12000ms |
|
TagSync GC Duration |
45288 |
TagSync GC Duration Exceeds the Threshold |
TagSync responds slowly. |
12000ms |
|
UserSync Non-Heap Memory Usage |
45283 |
UserSync Non-Heap Memory Usage Exceeds the Threshold |
Non-heap memory overflow may cause service breakdown. |
80.0% |
|
UserSync Direct Memory Usage |
45282 |
UserSync Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
80.0% |
|
UserSync Heap Memory Usage |
45281 |
UserSync Heap Memory Usage Exceeds the Threshold |
Heap memory overflow may cause service breakdown. |
95.0% |
|
PolicySync Direct Memory Usage |
45290 |
PolicySync Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
80.0% |
|
PolicySync Heap Memory Usage |
45289 |
PolicySync Heap Memory Usage Exceeds the Threshold |
Heap memory overflow may cause service breakdown. |
95.0% |
|
PolicySync Non-Heap Memory Usage |
45291 |
PolicySync Non-Heap Memory Usage Exceeds the Threshold |
Non-heap memory overflow may cause service breakdown. |
80.0% |
|
RangerAdmin Non-Heap Memory Usage |
45279 |
RangerAdmin Non-Heap Memory Usage Exceeds the Threshold |
Non-heap memory overflow may cause service breakdown. |
80.0% |
|
RangerAdmin Heap Memory Usage |
45277 |
RangerAdmin Heap Memory Usage Exceeds the Threshold |
Heap memory overflow may cause service breakdown. |
95.0% |
|
RangerAdmin Direct Memory Usage |
45278 |
RangerAdmin Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
80.0% |
|
TagSync Direct Memory Usage |
45286 |
TagSync Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
80.0% |
|
TagSync Non-Heap Memory Usage |
45287 |
TagSync Non-Heap Memory Usage Exceeds the Threshold |
Non-heap memory overflow may cause service breakdown. |
80.0% |
|
TagSync Heap Memory Usage |
45285 |
TagSync Heap Memory Usage Exceeds the Threshold |
Heap memory overflow may cause service breakdown. |
95.0% |
|
ClickHouse |
Clickhouse Service Quantity Quota Usage in ZooKeeper |
45426 |
ClickHouse Service Quantity Quota Usage in ZooKeeper Exceeds the Threshold |
After the ZooKeeper quantity quota of the ClickHouse service exceeds the threshold, you cannot perform cluster operations on the ClickHouse service on FusionInsight Manager. As a result, the ClickHouse service cannot be used. |
90% |
ClickHouse Service Capacity Quota Usage in ZooKeeper |
45427 |
ClickHouse Service Capacity Quota Usage in ZooKeeper Exceeds the Threshold |
After the ZooKeeper capacity quota of the ClickHouse service exceeds the threshold, you cannot perform cluster operations on the ClickHouse service on FusionInsight Manager. As a result, the ClickHouse service cannot be used. |
90% |
|
IoTDB |
Maximum Merge (Intra-Space Merge) Latency |
45594 |
IoTDBServer Intra-Space Merge Duration Exceeds the Threshold |
Data write is blocked and the write operation performance is affected. |
300000ms |
Maximum Merge (Flush) Latency |
45593 |
IoTDBServer Flush Execution Duration Exceeds the Threshold |
Data write is blocked and the write operation performance is affected. |
300000ms |
|
Maximum Merge (Cross-Space Merge) Latency |
45595 |
IoTDBServer Cross-Space Merge Duration Exceeds the Threshold |
Data write is blocked and the write operation performance is affected. |
300000ms |
|
Maximum RPC (executeStatement) Latency |
45592 |
IoTDBServer RPC Execution Duration Exceeds the Threshold |
Running performance of the IoTDBServer process is affected. |
10000s |
|
Total GC Duration of IoTDBServer |
45587 |
IoTDBServer GC Duration Exceeds the Threshold |
A long GC duration of the IoTDBServer process may interrupt the services. |
12000ms |
|
Total GC Duration of ConfigNode |
45590 |
ConfigNode GC Duration Exceeds the Threshold |
A long GC duration of the ConfigNode process may interrupt services. |
12000ms |
|
IoTDBServer Heap Memory Usage |
45586 |
IoTDBServer Heap Memory Usage Exceeds the Threshold |
If the available IoTDBServer process heap memory is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
IoTDBServer Direct Memory Usage |
45588 |
IoTDBServer Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause service breakdown. |
90% |
|
ConfigNode Heap Memory Usage |
45589 |
ConfigNode Heap Memory Usage Exceeds the Threshold |
If the available ConfigNode process heap memory is insufficient, a memory overflow occurs and the service breaks down. |
90% |
|
ConfigNode Direct Memory Usage |
45591 |
ConfigNode Direct Memory Usage Exceeds the Threshold |
Direct memory overflow may cause the IoTDB instance to be unavailable. |
90% |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot