Configuring the Threshold
Scenarios
You can configure monitoring indicator thresholds to monitor the health status of indicators on FusionInsight Manager. If abnormal data occurs and the preset conditions are met, the system triggers an alarm and displays the alarm information on the alarm page.
Procedure
- Log in to FusionInsight Manager.
- Choose O&M > Alarm > Thresholds.
- Select a monitoring indicator for a specified host or service in the cluster.
Figure 1 Configuring indicator thresholds
For example, after selecting Host Memory Usage, the information about this indicator threshold is displayed.- If the alarm sending switch is displayed as , an alarm is triggered if the alarm threshold is reached.
- The alarm ID and alarm name contain the alarm information that is triggered by the threshold:
- FusionInsight Manager checks whether the value of each monitored indicator reaches the threshold. If the number of consecutive check times is equal to the value of Trigger Count, and the threshold is not reached in these checks, the system sends an alarm.
- The value can be customized. Check Period (s) indicates the interval for the system to check monitoring indicators.
- Rules for triggering an alarm.
- Click Create Rule to add rules used for monitoring indicators.
Table 1 Monitoring indicator rule parameters Parameter
Value
Description
Rule Name
CPU_MAX (example value)
Name of a rule.
Alarm Severity
- Critical
- Major
- Minor
- Warning
Alarm Severity
- Critical
- Major
- Minor
- Warning
Threshold Type
- Max value
- Min value
You can select the maximum or minimum value of an indicator. Setting this parameter to Max value, the system generates an alarm when the actual value of the indicator is greater than the threshold. Setting this parameter to Min value, the system generates an alarm when the actual value of the indicator is less than the threshold.
Date
- Daily
- Weekly
- Others
This parameter is used to set the date when the rule takes effect.
Add Date
09-30
This parameter is available only when Date is set to Others. You can set the date when the rule takes effect. Multiple options are available.
Thresholds
Start and End Time: 00: 00 to 08:30
This parameter is used to set the time range when the rule takes effect.
Threshold: 10
Specifies the threshold of the rule monitoring indicator.
For the last parameter in the table, you can click or to add or delete multiple start and end time or alarm indicator thresholds.
- Click OK to save the rules.
- Locate the row that contains an added rule, and click Apply in the Operation column. The value of Effective for this rule changes as Yes.
You can apply a new rule only after clicking Cancel.
Monitoring Indicator Reference
FusionInsight Manager alarm monitoring indicators are categorized into node information indicators and cluster service indicators. Table 2 describes the indicators whose thresholds can be configured on nodes. For all the monitoring indicator list, see the M.
Monitoring Indicator Group Name |
Indicator Name |
Description |
Default Threshold |
---|---|---|---|
CPU |
Host CPU Usage |
This indicator reflects the computing and control capabilities of the current cluster in a measurement period. By observing the indicator value, you can better understand the overall resource usage of the cluster. |
90.0% |
Disk |
Disk Usage |
Indicates the disk usage of a host. |
90.0% |
Disk Inode Usage |
Indicates the disk inode usage in a measurement period. |
80.0% |
|
Memory |
Host Memory Usage |
Indicates the average memory usage at the current time. |
90.0% |
Host Status |
Host File Handle Usage |
Indicates the usage of file handles of the host in a measurement period. |
80.0% |
Host PID Usage |
Indicates the PID usage of a host. |
90% |
|
Network Status |
TCP Ephemeral Port Usage |
Indicates the usage of temporary TCP ports of the host in a measurement period. |
80.0% |
Network Reading |
Read Packet Error Rate |
Indicates the read packet error rate of the network interface on the host in a measurement period. |
0.5% |
Read Packet Dropped Rate |
Indicates the read packet dropped rate of the network interface on the host in a measurement period. |
0.5% |
|
Read Throughput Rate |
Indicates the average read throughput (at MAC layer) of the network interface in a measurement period. |
80% |
|
Network Writing |
Write Packet Error Rate |
Indicates the write packet error rate of the network interface on the host in a measurement period. |
0.5% |
Write Packet Dropped Rate |
Indicates the write packet dropped rate of the network interface on the host in a measurement period. |
0.5% |
|
Write Throughput Rate |
Indicates the average write throughput (at MAC layer) of the network interface in a measurement period. |
80% |
|
Process |
Uninterruptible Sleep Process |
Indicates the number of D state processes on the host in a measurement period. |
0 |
omm Process Usage |
Indicates the usage of the omm process within a measurement period. |
90 |
Service |
Monitoring Indicator Group Name |
Indicator Name |
Description |
Default Threshold |
---|---|---|---|---|
DBService |
Database |
Database Connections Usage |
Indicates the usage of the number of database connections. |
90% |
Disk Space Usage of the Data Directory |
Disk space usage of the data directory. |
80% |
||
Flume |
Agent |
Heap Memory Usage Calculate |
Indicates the Flume heap memory usage. |
95.0% |
Flume Direct Memory Usage Statistics |
Indicates the Flume direct memory usage. |
80.0% |
||
Flume Non-heap Memory Usage |
Indicates the Flume non-heap memory usage. |
80.0% |
||
Total GC duration of Flume process |
Indicates the Flume total GC time. |
12000ms |
||
HBase |
GC |
GC time for old generation |
Indicates the total GC time of RegionServer. |
5000ms |
GC time for old generation |
Indicates the total GC time of HMaster. |
5000ms |
||
CPU and Memory |
RegionServer Direct Memory Usage Statistics |
Indicates the RegionServerReg direct memory usage. |
90% |
|
RegionServer Heap Memory Usage Statistics |
Indicates the RegionServer heap memory usage. |
90% |
||
HMaster Direct Memory Usage |
Indicates the HMaster direct memory usage. |
90% |
||
HMaster Heap Memory Usage Statistics |
Indicates the HMaster heap memory usage. |
90% |
||
Service |
Regions |
Indicates the number of regions of a RegionServer. |
2000 |
|
Region in transaction count over threshold |
Number of regions that are in the RIT state and reach the threshold duration. |
1 |
||
Replication |
Replication sync failed times |
Indicates the number of times that DR data fails to be synchronized. |
1 |
|
Queue |
Compaction Queue Size |
Compaction queue size. |
100 |
|
HDFS |
File and Block |
Lost Blocks |
Number of missing copy blocks in the HDFS file system. |
0 |
Blocks Under Replicated |
Total number of blocks that need to be replicated by the NameNode. |
1000 |
||
RPC |
Average Time of Active NameNode RPC Processing |
Indicates the average RPC processing time. |
100ms |
|
Average Time of Active NameNode RPC Queuing |
Indicates the average RPC queuing time. |
200ms |
||
Disk |
Disk Usage |
Indicates the HDFS disk usage. |
80% |
|
Percentage of DataNode Capacity |
Indicates the disk usage of DataNodes in the HDFS. |
80% |
||
Percentage of Reserved Space for Replicas of Unused Space |
Indicates the percentage of the reserved disk space of all the copies to the total unused disk space of DataNodes. |
90% |
||
Resource |
Faulty DataNodes |
Indicates the number of faulty DataNodes. |
3 |
|
NameNode Non Heap Memory Usage Statistics |
Indicates the percentage of NameNode non-heap memory usage. |
90% |
||
NameNode Direct Memory Usage Statistics |
Indicates the percentage of direct memory used by NameNodes. |
90% |
||
NameNode Heap Memory Usage Statistics |
Indicates the percentage of NameNode non-heap memory usage. |
95% |
||
DataNode Non Heap Memory Usage Statistics |
Indicates the percentage of DataNode non-heap memory usage. |
90% |
||
DataNode Direct Memory Usage Statistics |
Indicates the percentage of direct memory used by DataNodes. |
90% |
||
DataNode Heap Memory Usage Statistics |
Indicates the percentage of DataNode non-heap memory usage. |
95% |
||
Garbage Collection |
GC Time |
Indicates the Garbage collection (GC) duration of NameNodes per minute. |
12000ms |
|
GC Time |
Indicates the GC duration of DataNodes per minute. |
12000ms |
||
Hive |
HQL |
Percentage of HQL Statements That Are Executed Successfully by Hive |
Indicates the percentage of HQL statements that are executed successfully by Hive. |
90.0% |
Background |
Background Thread Usage |
Indicates the percentage of Background thread usage. |
90% |
|
GC |
Total GC Time in Milliseconds |
Indicates the total GC time of MetaStore. |
12000ms |
|
Total GC Time in Milliseconds |
Indicates the total GC time of HiveServer. |
12000ms |
||
Capacity |
Percentage of HDFS Space Used by Hive to the Available Space |
Indicates the percentage of HDFS space used by Hive to the available space. |
85.0% |
|
CPU and Memory |
MetaStore Direct Memory Usage Statistics |
Indicates the MetaStore direct memory usage. |
95% |
|
MetaStore Non-Heap Memory Usage Statistics |
Indicates the MetaStore non-heap memory usage. |
95% |
||
MetaStore Heap Memory Usage Statistics |
Indicates the MetaStore heap memory usage. |
95% |
||
HiveServer Direct Memory Usage Statistics |
Indicates the HiveServer direct memory usage. |
95% |
||
HiveServer Non-Heap Memory Usage Statistics |
Indicates the HiveServer non-heap memory usage. |
95% |
||
HiveServer Heap Memory Usage Statistics |
Indicates the HiveServer heap memory usage. |
95% |
||
Session |
Percentage of Sessions Connected to the HiveServer to Maximum Number of Sessions Allowed by the HiveServer |
Indicates the percentage of the number of sessions connected to the HiveServer to the maximum number of sessions allowed by the HiveServer. |
90.0% |
|
Kafka |
Partition |
Percentage of Partitions That Are Not Completely Synchronized |
Indicates the percentage of partitions that are not completely synchronized to total partitions. |
50% |
Other |
Unavailable Partition Percentage |
Disk usage of the disk where the Broker data directory is located. |
40% |
|
User Connection Usage on Broker |
User connection usage on the broker. |
80% |
||
Disk |
Broker Disk Usage |
Indicates the disk usage of the disk where the Broker data directory is located. |
80% |
|
Process |
Broker GC Duration per Minute |
Indicates the GC duration of the Broker process per minute. |
12000ms |
|
Heap Memory Usage of Kafka |
Indicates the Kafka heap memory usage. |
95% |
||
Kafka Direct Memory Usage |
Indicates the Kafka direct memory usage. |
95% |
||
Loader |
Memory |
Heap Memory Usage Calculate |
Indicates the Loader heap memory usage. |
95% |
Loader Direct Memory Usage Statistics |
Indicates the Loader direct memory usage. |
80.0% |
||
Non heap Memory Usage Calculate |
Indicates the Loader non-heap memory usage. |
80% |
||
GC |
Total GC time in milliseconds |
Indicates the total GC time of Loader. |
12000ms |
|
MapReduce |
Garbage Collection |
GC Time |
Indicates the GC time. |
12000ms |
Resource |
JobHistoryServer Direct Memory Usage Statistics |
Indicates the JobHistoryServer direct memory usage. |
90% |
|
JobHistoryServer Non Heap Memory Usage Statistics |
Indicates the JobHistoryServer non-heap memory usage. |
90% |
||
JobHistoryServer Heap Memory Usage Statistics |
Indicates the JobHistoryServer non-heap memory usage. |
95% |
||
Oozie |
Memory |
Heap Memory Usage Calculate |
Indicates the Oozie heap memory usage. |
95.0% |
Oozie Direct Buffer Resource Percentage |
Indicates the Oozie direct memory usage. |
80.0% |
||
Non Heap Memory Usage Calculate |
Indicates the Oozie non-heap memory usage. |
80% |
||
GC |
Total GC duration of Oozie process |
Indicates the Oozie total GC time. |
12000ms |
|
Spark2x |
Memory |
JDBCServer2x Heap Memory Usage Statistics |
Indicates the JDBCServer2x heap memory usage. |
95% |
JDBCServer2x Direct Memory Usage Statistics |
Indicates the JDBCServer2x direct memory usage. |
95% |
||
JDBCServer2x Non-Heap Memory Usage Statistics |
Indicates the JDBCServer2x non-heap memory usage. |
95% |
||
JobHistory2x Direct Memory Usage Statistics |
Indicates the JobHistory2x direct memory usage. |
95% |
||
JobHistory2x Non-Heap Memory Usage Statistics |
Indicates the JobHistory2x non-heap memory usage. |
95% |
||
JobHistory2x Heap Memory Usage Statistics |
Indicates the JobHistory2x heap memory usage. |
95% |
||
IndexServer2x Direct Memory Usage Statistics |
Indicates the IndexServer2x direct memory usage. |
95% |
||
IndexServer2x Heap Memory Usage Statistics |
ndicates the IndexServer2x heap memory usage. |
95% |
||
IndexServer2x Non-Heap Memory Usage Statistics |
Indicates the IndexServer2x non-heap memory usage. |
95% |
||
GC number |
Full GC Number of JDBCServer2x |
Indicates the total GC number of JDBCServer2x. |
12 |
|
Full GC Number of JobHistory2x |
Indicates the total GC number of JobHistory2x. |
12 |
||
Full GC Number of IndexServer2x |
Indicates the total GC number of IndexServer2x. |
12 |
||
GC Time |
Total GC time in milliseconds |
Indicates the total GC time of JDBCServer2x. |
12000ms |
|
Total GC time in milliseconds |
Indicates the total GC time of JobHistory2x. |
12000ms |
||
Total GC time in milliseconds |
Indicates the total GC time of IndexServer2x. |
12000ms |
||
Storm |
Cluster |
Number of Available Supervisors |
Indicates the number of available Supervisor processes in the cluster in a measurement period. |
1 |
Slot Usage |
Indicates the slot usage in the cluster in a measurement period. |
80.0% |
||
Nimbus |
Heap Memory Usage Calculate |
Indicates the Nimbus heap memory usage. |
80% |
|
Yarn |
Resource |
NodeManager Direct Memory Usage Statistics |
Indicates the percentage of direct memory used by NodeManagers. |
90% |
NodeManager Heap Memory Usage Statistics |
Indicates the percentage of NodeManager heap memory usage. |
95% |
||
NodeManager Non Heap Memory Usage Statistics |
Indicates the percentage of NodeManager non-heap memory usage. |
90% |
||
ResourceManager Direct Memory Usage Statistics |
Indicates the Kafka direct memory usage. |
90% |
||
ResourceManager Heap Memory Usage Statistics |
Indicates the ResourceManager heap memory usage. |
95% |
||
ResourceManager Non Heap Memory Usage Statistics |
Indicates the ResourceManager non-heap memory usage. |
90% |
||
CPU and Memory |
Pending Memory |
Pending memory capacity. |
83886080MB |
|
Other |
Failed Applications of root queue |
Number of failed tasks in the root queue. |
50 |
|
Terminated Applications of root queue |
Number of killed tasks in the root queue. |
50 |
||
Garbage collection |
GC Time |
Indicates the GC duration of NodeManager per minute. |
12000ms |
|
GC Time |
Indicates the GC duration of ResourceManager per minute. |
12000ms |
||
Application |
Pending Applications |
Pending tasks. |
60 |
|
ZooKeeper |
Connection |
ZooKeeper Connections Usage |
Indicates the percentage of the used connections to the total connections of ZooKeeper. |
80% |
CPU and Memory |
Heap Memory Usage Calculate |
Indicates the ZooKeeper direct memory usage. |
95% |
|
Direct Memory Usage Calculate |
Indicates the ZooKeeper heap memory usage. |
80% |
||
GC |
ZooKeeper GC Duration per Minute |
Indicates the GC time of ZooKeeper every minute. |
12000ms |
|
meta |
OBS Meta data Operations |
Average Time for Calling the OBS Metadata API |
Average time for calling the OBS metadata APIs. |
500ms |
Success Rate for Calling the OBS Metadata API |
Success rate of calling the OBS metadata APIs |
99.0% |
||
OBS data write operation |
Success Rate for Calling the OBS Write API |
Success rate of calling the OBS data write APIs. |
99.0% |
|
OBS data read operation |
Success Rate for Calling the OBS Data Read API |
Success rate of calling the OBS data read operation APIs. |
99.0% |
|
Ranger |
GC |
UserSync GC Duration |
UserSync garbage collection (GC) duration. |
12000ms |
RangerAdmin GC Duration |
RangerAdmin garbage collection (GC) duration. |
12000ms |
||
TagSync GC Duration |
TagSync garbage collection (GC) duration. |
12000ms |
||
CPU and Memory |
UserSync Non-Heap Memory Usage |
UserSync non-heap memory usage in percentage. |
80.0% |
|
UserSync Direct Memory Usage |
UserSync direct memory usage in percentage. |
80.0% |
||
UserSync Heap Memory Usage |
UserSync heap memory usage in percentage. |
95.0% |
||
RangerAdmin Non-Heap Memory Usage |
RangerAdmin non-heap memory usage. |
80.0% |
||
RangerAdmin Heap Memory Usage |
RangerAdmin heap memory usage in percentage. |
95.0% |
||
RangerAdmin Direct Memory Usage |
RangerAdmin direct memory usage. |
80.0% |
||
TagSync Direct Memory Usage |
TagSync direct memory usage in percentage. |
80.0% |
||
TagSync Non-Heap Memory Usage |
TagSync non-heap memory usage in percentage. |
80.0% |
||
TagSync Heap Memory Usage |
TagSync heap memory usage in percentage. |
95.0% |
||
ClickHouse |
Cluster Quota |
Clickhouse service quantity quota usage in ZooKeeper |
Quota of the ZooKeeper nodes used by the ClickHouse service. |
90% |
Capacity quota usage of the Clickhouse service in ZooKeeper |
Capacity quota of ZooKeeper directory used by the ClickHouse service. |
90% |
||
IoTDB |
GC |
IoTDBServer GC Duration |
IoTDBServer garbage collection (GC) duration. |
12000ms |
CPU and Memory |
IoTDBServer Heap Memory Usage |
IoTDBServer heap memory usage in percentage. |
90% |
|
IoTDBServer Direct Memory Usage |
IoTDBServer direct memory usage in percentage. |
90% |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot