Configuring Alarm Threshold
Scenario
You can configure monitoring indicator thresholds to monitor the health status of indicators on FusionInsight Manager. If abnormal data occurs and the preset conditions are met, the system triggers an alarm and displays the alarm information on the alarm page.
Procedure
- Log in to FusionInsight Manager.
- Choose O&M > Alarm > Thresholds.
- Select a monitoring metric for a host or service in the cluster.
Figure 1 Configuring the threshold for a metric
For example, after selecting Host Memory Usage, the information about this indicator threshold is displayed.- When Switch is on, an alarm will be triggered if the threshold is met.
- When Alarm Severity is on, hierarchical alarms are enabled. The system dynamically reports alarms of the corresponding severity based on the real-time metric values and hierarchical thresholds set for that severity.
- Alarm ID and Alarm Name: alarm information triggered against the threshold
- Trigger Count: FusionInsight Manager checks whether the value of a monitoring metric reaches the threshold. If the number of consecutive checks reaches the value of Trigger Count, an alarm is generated. Trigger Count is configurable.
- Check Period (s): interval for the system to check the monitoring metric.
- The rules in the rule list are used to trigger alarms.
- Click Create Rule to add rules used for monitoring indicators.
Table 1 Monitoring indicator rule parameters Parameter
Description
Example Value
Rule Name
Set a rule name.
CPU_MAX
Severity
Select an alarm severity.
After Alarm Severity is on, you need to configure the alarm severity in Thresholds.
- Critical
- Major
- Minor
- Warning
Threshold Type
You can use the maximum or minimum value of an indicator as the alarm triggering threshold. If Threshold Type is set to Max value, the system generates an alarm when the value of the specified indicator is greater than the threshold. If Threshold Type is set to Min value, the system generates an alarm when the value of the specified indicator is less than the threshold.
- Max value
- Min value
Date
This parameter is used to set the date when the rule takes effect.
If Alarm Severity is on, only Daily is supported.
- Daily
- Weekly
- Others
Add Date
This parameter is available only when Date is set to Others. You can set the date when the rule takes effect. Multiple options are available.
09-30
Thresholds
This parameter is used to set the time range when the rule takes effect.
If Alarm Severity is on, you cannot set the start time and end time. The default start time and end time are 00:00-23:59.
Start and End Time: 00:00–08:30
Thresholds of the rule monitoring metric
After Alarm Severity is on, different alarm severities can be set for a cluster based on different thresholds.
- Alarm severity
- Threshold
You can click to set multiple time ranges for the threshold or click to delete one.
- Click OK to save the rules.
- Locate the row that contains an added rule, and click Apply in the Operation column. The value of Effective for this rule changes to Yes.
A new rule can be applied only after you click Cancel for an existing rule.
Monitoring Metric Reference
FusionInsight Manager alarm monitoring metrics are classified as node information metrics and cluster service metrics. Table 2 describes the metrics for which you can configure thresholds on nodes.
Metric Group |
Metric |
Description |
Default Threshold |
---|---|---|---|
CPU |
Host CPU Usage |
This indicator reflects the computing and control capabilities of the current cluster in a measurement period. By observing the indicator value, you can better understand the overall resource usage of the cluster. |
90.0% |
Disk |
Disk Usage |
Indicates the disk usage of a host. |
95% (critical) 85% (major) |
Disk Inode Usage |
Indicates the disk inode usage in a measurement period. |
95% (critical) 80% (major) |
|
Memory |
Host Memory Usage |
Indicates the average memory usage at the current time. |
95% (critical) 90% (major) |
Host Status |
Host File Handle Usage |
Indicates the usage of file handles of the host in a measurement period. |
95% (critical) 80% (major) |
Host PID Usage |
Indicates the PID usage of a host. |
95% (critical) 90% (major) |
|
Network Status |
TCP Ephemeral Port Usage |
Indicates the usage of temporary TCP ports of the host in a measurement period. |
95% (critical) 80% (major) |
Network Reading |
Read Packet Error Rate |
Indicates the read packet error rate of the network interface on the host in a measurement period. |
5% (critical) 0.5% (major) |
Read Packet Dropped Rate |
Indicates the read packet dropped rate of the network interface on the host in a measurement period. |
5% (critical) 0.5% (major) |
|
Read Throughput Rate |
Indicates the average read throughput (at MAC layer) of the network interface in a measurement period. |
80% |
|
Network Writing |
Write Packet Error Rate |
Indicates the write packet error rate of the network interface on the host in a measurement period. |
5% (critical) 0.5% (major) |
Write Packet Dropped Rate |
Indicates the write packet dropped rate of the network interface on the host in a measurement period. |
5% (critical) 0.5% (major) |
|
Write Throughput Rate |
Indicates the average write throughput (at MAC layer) of the network interface in a measurement period. |
80% |
|
Process |
Uninterruptible Sleep Process |
Number of D state and Z state processes on the host in a measurement period |
0 |
omm Process Usage |
omm process usage in a measurement period |
95% (critical) 90% (major) |
Service |
Metric Group |
Metric |
Description |
Default Threshold |
---|---|---|---|---|
DBService |
Database |
Usage of the Number of Database Connections |
Indicates the usage of the number of database connections. |
95% (critical) 90% (major) |
Disk Space Usage of the Data Directory |
Disk space usage of the data directory |
85% (critical) 80% (major) |
||
MOTService |
Database |
MOT Connections Usage |
Usage of MOTService database connections |
90% |
MOT Disk Space Usage of the Data Directory |
Disk space usage of the MOTService data directory |
80% |
||
MOT Used Memory Percentage |
MOTService memory usage |
85% |
||
MOT Used CPU Percentage |
MOTService CPU usage |
80% |
||
Elasticsearch |
Disk |
Data Directory Usage |
Elasticsearch data directory usage |
80% |
Garbage Collection |
GC Time |
Garbage collection duration of the Elasticsearch instance process |
30000ms |
|
Memory |
Heap Memory Usage |
Elasticsearch heap memory usage |
90% |
|
Shard |
Elasticsearch Shard Document Number |
Number of Elasticsearch sharded files |
100000000 |
|
Elasticsearch Shard Data Volume |
Size of Elasticsearch shards |
41943040 |
||
Number of Instance Shards |
Total number of Elasticsearch instance shards |
400 |
||
Replica Quantity Statistics |
Total shard number |
Number of primary shards whose Elasticsearch status is down |
70000 |
|
Flume |
Agent |
Flume Heap Memory Usage Calculate |
Indicates the Flume heap memory usage. |
95.0% (critical) 90.0% (major) |
Flume Direct Memory Usage Statistics |
Indicates the Flume direct memory usage. |
90.0% (critical) 80.0% (major) |
||
Flume Non-heap Memory Usage |
Indicates the Flume non-heap memory usage. |
80.0% |
||
Total GC duration of Flume process |
Indicates the Flume total GC time. |
12000 ms |
||
FTP-Server |
Process |
FTP-Server Heap Memory Usage Calculate |
Indicates the FTP-Server heap memory usage. |
95.0% |
FTP-Server Direct Buffer Usage Statistics |
Indicates the FTP-Server direct memory usage. |
80.0% |
||
FTP-Server Non-Heap Memory Usage |
Indicates the FTP-Server non-heap memory usage. |
80.0% |
||
Total GC duration of FTP-Server process |
Indicates the total GC time of FTP-Server. |
12000 ms |
||
HBase |
GC |
GC time for old generation |
Total GC time of RegionServer |
5000 ms |
GC time for old generation |
Total GC time of HMaster |
5000 ms |
||
CPU & memory |
RegionServer Direct Memory Usage Statistics |
RegionServer direct memory usage |
90% |
|
RegionServer Heap Memory Usage Statistics |
RegionServer heap memory usage |
90% |
||
HMaster Direct Memory Usage |
HMaster direct memory usage |
90% |
||
HMaster Heap Memory Usage Statistics |
HMaster heap memory usage |
90% |
||
Service |
Number of Online Regions of a RegionServer |
Number of regions of a RegionServer |
5000 (critical) 2000 (major) |
|
Region in transaction count over threshold |
Number of regions that are in the RIT state and reach the threshold duration |
1 |
||
Handler |
RegionServer Handler Usage |
Handler usage of RegionServer |
100% (critical) 90% (major) |
|
Replication |
Replication sync failed times (RegionServer) |
Number of times that DR data fails to be synchronized |
1 |
|
Number of Log Files to Be Synchronized in the Active Cluster |
Number of log files to be synchronized in the active cluster |
128 |
||
Number of HFiles to Be Synchronized in the Active Cluster |
Number of HFiles to be synchronized in the active cluster |
128 |
||
RPC |
Number of RegionServer Opened Connections |
Number of open RegionServer RPC connections |
200 (critical) 100 (major) |
|
99th Percentile of the RegionServer RPC Request Response Time |
99th percentile of the RegionServer RPC request response time |
10000 ms (critical) 5000 ms (major) |
||
99th Percentile of the RegionServer RPC Request Processing Time |
99th percentile of the RegionServer RPC request processing time |
10000 ms (critical) 5000 ms (major) |
||
Operation statistics |
Number of Timed-Out WAL Writes in RegionServers |
Number of timed-out WAL writes in RegionServers |
500 (critical) 300 (major) |
|
Queue |
Number of Tasks in RegionServer RPC Write Queues |
Number of tasks in RegionServer RPC write queues |
2000 (critical) 1600 (major) |
|
Number of Tasks in RegionServer RPC Read Queues |
Number of tasks in RegionServer RPC read queues |
2000 (critical) 1600 (major) |
||
RegionServer Call Queue Size |
RegionServer call queue size |
838860800 (critical) 629145600 (major) |
||
Compaction Queue Size |
Size of the Compaction queue |
100 |
||
HDFS |
File and Block |
Lost Blocks |
Number of backup blocks that the HDFS file system lacks |
0 |
Blocks Under Replicated |
Total number of blocks that need to be replicated by the NameNode |
1000 |
||
RPC |
Average Time of Active NameNode RPC Processing |
Average NameNode RPC processing time |
100 ms (major) 200 ms (critical) |
|
Average Time of Active NameNode RPC Queuing |
Average NameNode RPC queuing time |
200 ms (major) 300 ms (critical) |
||
Disk |
HDFS Disk Usage |
HDFS disk usage |
80% (major) 90% (critical) |
|
DataNode Disk Usage |
Disk usage of DataNodes in the HDFS |
80% |
||
Percentage of Reserved Space for Replicas of Unused Space |
Percentage of the reserved disk space of all the copies to the total unused disk space of DataNodes |
90% |
||
Resource |
Faulty DataNodes |
Number of faulty DataNodes |
3 |
|
NameNode Non-Heap Memory Usage Statistics |
Percentage of NameNode non-heap memory usage |
90% |
||
NameNode Direct Memory Usage Statistics |
Percentage of direct memory used by NameNodes |
90% |
||
NameNode Heap Memory Usage Statistics |
Percentage of NameNode non-heap memory usage |
95% |
||
DataNode Direct Memory Usage Statistics |
Percentage of direct memory used by DataNodes |
90% |
||
DataNode Heap Memory Usage Statistics |
DataNode heap memory usage |
95% |
||
DataNode Heap Memory Usage Statistics |
Percentage of DataNode non-heap memory usage |
90% |
||
Garbage Collection |
GC Time (NameNode)/GC Time (DataNode) |
Garbage collection (GC) duration of NameNodes per minute |
10000 ms (major) 15000 ms (critical) |
|
GC Time |
GC duration of DataNodes per minute |
12000 ms (major) 20000 ms (critical) |
||
Hive |
HQL |
Percentage of HQL Statements That Are Executed Successfully by Hive |
Percentage of HQL statements that are executed successfully by Hive |
90% (critical) 80% (major) |
Connections |
Percentage of Number of Sessions Connected to the MetaStore to the Maximum Allowed (MetaStore) |
Percentage of the number of sessions connected to MetaStore to the maximum number of sessions allowed by MetaStore |
90% (critical) 80% (major) |
|
Background |
Background Thread Usage |
Background thread usage |
90% (critical) 80% (major) |
|
GC |
Total GC time of MetaStore |
Total GC time of MetaStore |
12000 ms |
|
HiveServer Total GC Time in Milliseconds |
Total GC time of HiveServer |
12000 ms |
||
Capacity |
Percentage of HDFS Space Used by Hive to the Available Space |
Percentage of HDFS space used by Hive to the available space |
95% (critical) 85% (major) |
|
CPU & memory |
MetaStore Direct Memory Usage Statistics |
MetaStore direct memory usage |
95% (critical) 85% (major) |
|
MetaStore Non-Heap Memory Usage Statistics |
MetaStore non-heap memory usage |
95% (critical) 85% (major) |
||
MetaStore Heap Memory Usage Statistics |
MetaStore heap memory usage |
95% (critical) 85% (major) |
||
HiveServer Direct Memory Usage Statistics |
HiveServer direct memory usage |
95% (critical) 85% (major) |
||
HiveServer Non-Heap Memory Usage Statistics |
HiveServer non-heap memory usage |
95% (critical) 85% (major) |
||
HiveServer Heap Memory Usage Statistics |
HiveServer heap memory usage |
95% (critical) 85% (major) |
||
Session |
Percentage of Sessions Connected to the HiveServer to Maximum Number of Sessions Allowed by the HiveServer |
Indicates the percentage of the number of sessions connected to the HiveServer to the maximum number of sessions allowed by the HiveServer. |
90% (critical) 80% (major) |
|
Kafka |
Partition |
Percentage of Partitions That Are Not Completely Synchronized |
Indicates the percentage of partitions that are not completely synchronized to total partitions. |
60% (critical) 50% (major) |
Disk |
Broker Disk Usage |
Indicates the disk usage of the disk where the Broker data directory is located. |
90% (critical) 85% (major) |
|
Disk I/O Rate of a Broker |
I/O usage of the disk where the Broker data directory is located |
80% |
||
Process |
Broker GC Duration per Minute |
Indicates the GC duration of the Broker process per minute. |
12000 ms |
|
Heap Memory Usage of Kafka |
Indicates the Kafka heap memory usage. |
95% |
||
Kafka Direct Memory Usage |
Indicates the Kafka direct memory usage. |
100% (critical) 95% (major) |
||
Others |
User Connection Usage on Broker |
Usage of user connections on Broker |
90% (critical) 85% (major) |
|
Loader |
Memory |
Heap Memory Usage Calculate |
Indicates the Loader heap memory usage. |
95% (critical) 80% (major) |
Direct Memory Usage of Loader |
Indicates the Loader direct memory usage. |
95% (critical) 80% (major) |
||
Non-heap Memory Usage of Loader |
Indicates the Loader non-heap memory usage. |
95% (critical) 80% (major) |
||
GC |
Total GC time of Loader |
Indicates the total GC time of Loader. |
20000 ms (critical) 12000 ms (major) |
|
MapReduce |
Garbage Collection |
GC Time |
Indicates the GC time. |
20000 ms (critical) 12000 ms (major) |
Resource |
JobHistoryServer Direct Memory Usage Statistics |
Indicates the JobHistoryServer direct memory usage. |
95% (critical) 90% (major) |
|
JobHistoryServer Non-Heap Memory Usage Statistics |
Indicates the JobHistoryServer non-heap memory usage. |
95% (critical) 90% (major) |
||
JobHistoryServer Heap Memory Usage Statistics |
Indicates the JobHistoryServer non-heap memory usage. |
95% (critical) 90% (major) |
||
Metadata |
Others |
Heap Memory Usage Calculate |
Indicates the Metadata heap memory usage. |
95% |
Metadata Direct Memory Usage Statistics |
Indicates the metadata direct memory usage. |
80.0% |
||
Metadata Non-heap Memory Usage |
Indicates the metadata non-heap memory usage. |
80.0% |
||
Total GC time of Metadata |
Indicates the metadata total GC time. |
20000 ms (critical) 12000 ms (major) |
||
Oozie |
Memory |
Oozie Heap Memory Usage Calculate |
Indicates the Oozie heap memory usage. |
95% |
Oozie Direct Memory Usage |
Indicates the Oozie direct memory usage. |
90% |
||
Oozie Non-heap Memory Usage |
Indicates the Oozie non-heap memory usage. |
90% |
||
GC |
Total GC duration of Oozie |
Indicates the Oozie total GC time. |
20000 ms (critical) 12000 ms (major) |
|
Solr |
Replica Quantity Statistics |
Bad Replica Number |
Number of bad replicas of a Solr instance |
0 |
Garbage Collection |
GC Time |
Garbage collection duration of the Solr instance process |
12000 ms |
|
Memory |
Heap Memory Usage |
Indicates the heap memory usage. |
99% (critical) 95% (major) |
|
Shard |
Solr Shard Data Volume |
Data volume of Solr shards |
83886080 (critical) 41943040 (Major) |
|
Solr Shard Document Number |
Number of Solr shard documents |
400000000 |
||
Spark |
Memory |
JDBCServer Heap Memory Usage Statistics |
JDBCServer heap memory usage |
95% (critical) 85% (major) |
JDBCServer Direct Memory Usage Statistics |
JDBCServer direct memory usage |
95% (critical) 85% (major) |
||
JDBCServer Non-Heap Memory Usage Statistics |
JDBCServer non-heap memory usage |
95% (critical) 85% (major) |
||
JobHistory Direct Memory Usage Statistics |
JobHistory direct memory usage |
95% (major) 85% (minor) |
||
JobHistory Non-Heap Memory Usage Statistics |
JobHistory non-heap memory usage |
95% (major) 85% (minor) |
||
JobHistory Heap Memory Usage Statistics |
JobHistory heap memory usage |
95% (major) 85% (minor) |
||
IndexServer Direct Memory Usage Statistics |
IndexServer direct memory usage |
95% (critical) 85% (major) |
||
IndexServer Heap Memory Usage Statistics |
IndexServer heap memory usage |
95% (critical) 85% (major) |
||
IndexServer Non-Heap Memory Usage Statistics |
IndexServer non-heap memory usage |
95% (critical) 85% (major) |
||
GC Count |
Full GC Number of JDBCServer |
Full GC times of JDBCServer |
12 (critical) 9 (major) |
|
Full GC Number of JobHistory |
Full GC times of JobHistory |
12 (critical) 9 (major) |
||
Full GC Number of IndexServer |
Full GC times of IndexServer |
12 (critical) 9 (major) |
||
GC Time |
JDBCServer Total GC Time in Milliseconds |
Total GC time of JDBCServer |
12000 ms (critical) 9600 ms (major) |
|
JobHistory Total GC Time in Milliseconds |
Total GC time of JobHistory |
12000 ms (major) 9600 ms (minor) |
||
IndexServer Total GC Time in Milliseconds |
Total GC time of IndexServer |
12000 ms (critical) 9600 ms (major) |
||
Yarn |
Resources |
NodeManager Direct Memory Usage Statistics |
Indicates the percentage of direct memory used by NodeManagers. |
90% |
NodeManager Heap Memory Usage Statistics |
Indicates the percentage of NodeManager heap memory usage. |
95% |
||
NodeManager Non-Heap Memory Usage Statistics |
Indicates the percentage of NodeManager non-heap memory usage. |
90% |
||
ResourceManager Direct Memory Usage Statistics |
Indicates the ResourceManager direct memory usage. |
90% |
||
ResourceManager Heap Memory Usage Statistics |
Indicates the ResourceManager heap memory usage. |
95% |
||
ResourceManager Non-Heap Memory Usage Statistics |
Indicates the ResourceManager non-heap memory usage. |
90% |
||
Garbage collection |
GC Time |
Indicates the GC duration of NodeManager per minute. |
12000 ms (major) 20000 ms (critical) |
|
GC Time |
Indicates the GC duration of ResourceManager per minute. |
10000 ms (major) 15000 ms (critical) |
||
Others |
Failed Applications of root queue |
Number of failed tasks in the root queue |
50 |
|
Terminated Applications of root queue |
Number of killed tasks in the root queue |
50 |
||
CPU & memory |
Pending Memory |
Pending memory capacity |
83886080MB |
|
Application |
Pending Applications |
Pending tasks |
60 |
|
ZooKeeper |
Connection |
ZooKeeper Connections Usage |
Indicates the percentage of the used connections to the total connections of ZooKeeper. |
80% (major) 90% (critical) |
CPU & memory |
ZooKeeper Heap Memory Usage |
Indicates the ZooKeeper heap memory usage. |
95% |
|
ZooKeeper Direct Memory Usage |
Indicates the ZooKeeper direct memory usage. |
80% |
||
GC |
ZooKeeper GC Duration per Minute |
Indicates the GC time of ZooKeeper every minute. |
5000 ms (major) 10000 ms (critical) |
|
meta |
OBS data write operation |
Total Number of Failed OBS Write API Calls |
Total number of failed OBS write API calls |
10 |
OBS exception |
Total Number of OBSFileConflictException Errors |
Total number of OBSFileConflictException errors |
5 |
|
Total Number of OBS AccessControlExceptions Errors |
Total number of OBS AccessControlExceptions errors |
5 |
||
Total Number of OBS EOFException Errors |
Total number of OBS EOFException errors |
5 |
||
Total Number of OBSMethodNotAllowedException Errors |
Total number of OBSMethodNotAllowedException errors |
5 |
||
Total Number of OBSIOException Errors |
Total number of OBSIOException errors |
5 |
||
Total Number of OBS FileNotFoundException Errors |
Total number of OBS FileNotFoundException errors |
5 |
||
Total Number of Throttled OBS Operations |
Total number of throttled OBS operations |
5 |
||
Total Number of OBSIllegalArgumentExceptions Errors |
Total number of OBSIllegalArgumentExceptions errors |
5 |
||
Total Number of Other OBS Exceptions |
Total number of other OBS exceptions reported by all nodes |
5 |
||
OBS data read operation |
Total Number of Failed OBS Read API Calls |
Total number of failed OBS read API calls |
10 |
|
Total Number of Failed OBS readFully API Calls |
Total number of failed OBS readFully API calls |
10 |
||
Ranger |
GC |
UserSync GC Duration |
UserSync garbage collection (GC) duration |
20000 ms (critical) 12000 ms (major) |
PolicySync GC Duration |
PolicySync GC Duration |
20000 ms (critical) 12000 ms (major) |
||
RangerAdmin GC Duration |
RangerAdmin GC duration |
20000 ms (critical) 12000 ms (major) |
||
TagSync GC Duration |
TagSync GC duration |
20000 ms (critical) 12000 ms (major) |
||
CPU & memory |
UserSync Non-Heap Memory Usage |
UserSync non-heap memory usage |
80.0% |
|
UserSync Direct Memory Usage |
UserSync direct memory usage |
80.0% |
||
UserSync Heap Memory Usage |
UserSync heap memory usage |
95.0% |
||
PolicySync Direct Memory Usage |
Percentage of the PolicySync direct memory usage |
80.0% |
||
PolicySync Heap Memory Usage |
Percentage of PolicySync heap memory usage |
95.0% |
||
PolicySync Non-Heap Memory Usage |
Percentage of PolicySync non-heap memory usage |
80.0% |
||
RangerAdmin Non-Heap Memory Usage |
RangerAdmin non-heap memory usage |
80.0% |
||
RangerAdmin Heap Memory Usage |
RangerAdmin heap memory usage |
95.0% |
||
RangerAdmin Direct Memory Usage |
RangerAdmin direct memory usage |
80.0% |
||
TagSync Direct Memory Usage |
TagSync direct memory usage |
80.0% |
||
TagSync Non-Heap Memory Usage |
TagSync non-heap memory usage |
80.0% |
||
TagSync Heap Memory Usage |
TagSync heap memory usage |
95.0% |
||
ClickHouse |
Cluster Quota |
Clickhouse service quantity quota usage in ZooKeeper |
Quota of the ZooKeeper nodes used by a ClickHouse service |
95% (critical) 90% (major) |
Capacity quota usage of the Clickhouse service in ZooKeeper |
Capacity quota of ZooKeeper directory used by the ClickHouse service |
95% (critical) 90% (major) |
||
Concurrencies |
Concurrency Number (ClickHouseServer) |
Actual number of concurrent SQL statements of the ClickHouse service |
90 |
|
IoTDB |
Merge |
Maximum Task Merge (Intra-Space Merge) Latency |
Maximum latency of IoTDBServer intra-space merge |
300000ms |
Maximum Merge Task (Flush) Latency |
Maximum latency of IoTDBServer flush execution |
300000ms |
||
Maximum Task Merge (Cross-Space Merge) Latency |
Maximum latency of IoTDBServer cross-space merge |
300000ms |
||
RPC |
Maximum RPC (executeStatement) Latency |
Maximum latency of IoTDBServer RPC execution |
10000s |
|
GC |
Total GC duration of IoTDBServer |
Total time used for IoTDBServer garbage collection (GC) |
30000 ms (critical) 12000 ms (major) |
|
Total GC Duration of ConfigNode |
Total time used for ConfigNode garbage collection (GC) |
30000 ms (critical) 12000 ms (major) |
||
Memory |
IoTDBServer Heap Memory Usage |
IoTDBServer heap memory usage |
100% (critical) 90% (major) |
|
IoTDBServer Direct Memory Usage |
IoTDBServer direct memory usage |
100% (critical) 90% (major) |
||
ConfigNode Heap Memory Usage |
Percentage of the ConfigNode heap memory usage |
100% (critical) 90% (major) |
||
ConfigNode Direct Memory Usage |
Percentage of the ConfigNode direct memory usage |
100% (critical) 90% (major) |
||
Containers |
Others |
Metaspace Usage |
WebContainer metaspace usage |
75.0% |
Non-Heap Memory Usage |
WebContainer non-heap memory usage |
75.0% |
||
Heap Memory Usage |
WebContainer heap memory usage |
95.0% |
||
Failure Rate of Application Service Calling |
Failure rate of application service calling (SGP) |
10.0 |
||
Application Service Calling Latency |
Application service calling latency (SGP) |
10000.0 |
||
Maximum Number of Concurrent Application Services |
Maximum number of concurrent application services (SGP) |
120 |
||
BLU Health Status |
BLU health status statistics |
50.0% |
||
LdapServer |
Others |
Process Connections of a Single SlapdServer Instance |
Number of SlapdServer process connections |
1000 |
CPU Usage of a Single SlapdServer Instance |
SlapdServer CPU usage |
1200% |
||
Guardian |
GC |
TokenServer GC Duration |
TokenServer GC duration |
12000 ms |
CPU & memory |
TokenServer Heap Memory Usage |
Percentage of the heap memory used by the TokenServer process |
95.0% |
|
TokenServer Non-Heap Memory Usage |
Percentage of the non-heap memory used by the TokenServer process |
80.0% |
||
TokenServer Direct Memory Usage |
Percentage of the TokenServer direct memory usage |
80.0% |
||
Doris |
JVM |
Accumulated Old-Generation GC Duration |
Accumulated GC duration of the old-generation FE process |
3000ms |
Connection |
FE Ratio of the number of MySQL port connections (FE) |
Proportion of connections to the MySQL port of the FE node |
95% |
|
Disk |
BE Data Disk Usage |
BE data disk usage |
95% |
|
Disk Status of a Specified Data Directory |
Statistics on abnormal disk status of a specified data directory on the BE. |
1 |
||
Performance |
Maximum Compaction Score of All BE Nodes |
Maximum FE compaction score of all BE nodes |
10 |
|
Maximum Duration of RPC Requests Received by Each Method of the FE Thrift Interface |
Maximum duration of RPC requests received by each method of the FE thrift interface. |
5000ms |
||
Queue |
Queue Length of BE Periodic Report Tasks on the FE |
Queue length of BE periodic report tasks on the FE node |
10 |
|
Number of FE Tasks Queuing in the Thread Pool Interacting with the BE |
Number of FE tasks queuing in the thread pool interacting with the BE node |
10 |
||
Number of FE Tasks Queuing in the Task Processing Thread Pool |
Number of FE tasks that are queuing in the task processing thread pool on the FE node |
10 |
||
Queue Length of Query Execution Thread Pool |
Queue length of query execution thread pool |
20 |
||
Exception |
Failed Metadata Image Generation |
Failed metadata image generation on the FE node |
1 |
|
Failed Historical Metadata Image Clearing |
Failed historical metadata image clearing on the FE node |
1 |
||
Status of the Doris FE instance (FE) |
Process status statistics of the Doris FE instance. |
0 |
||
Status of the Doris BE instance (BE) |
Process status statistics of the Doris BE instance. |
0 |
||
Error Rate of TCP Packet Receiving (BE) |
Error rate of TCP packet receiving on the BE |
5% |
||
Whether the Number of Task Failures of a Certain Type Increases (BE) |
Whether the number of failures of a certain type of tasks executed on the BE increases |
1 |
||
CPU and Memory |
FE CPU Usage |
CPU usage statistics on FE nodes |
95% (critical) 90% (major) |
|
FE Memory Usage |
Memory usage statistics on FE nodes |
90% (critical) 85% (major) |
||
FE Memory Usage |
Memory usage of FE nodes |
95% |
||
FE Heap Memory Usage Rate |
Heap memory usage of FE nodes |
95% |
||
BE Memory Usage Rate |
Memory usage statistics on BE nodes |
90% (critical) 85% (major) |
||
Maximum BE Memory and Remaining Machine Memory on the BE |
The maximum memory required by the BE is greater than the remaining available memory. |
1 |
||
BE CPU Usage |
CPU usage statistics on BE nodes |
95% (critical) 90% (major) |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot