Configuring the Threshold

Scenarios

You can configure monitoring indicator thresholds to monitor the health status of indicators on FusionInsight Manager. If abnormal data occurs and the preset conditions are met, the system triggers an alarm and displays the alarm information on the alarm page.

Procedure

Log in to FusionInsight Manager.
Choose O&M > Alarm > Thresholds.
Select a monitoring indicator for a specified host or service in the cluster.

Figure 1 Configuring indicator thresholds
For example, after selecting Host Memory Usage, the information about this indicator threshold is displayed.
- If the alarm sending switch is displayed as , an alarm is triggered if the alarm threshold is reached.
- The alarm ID and alarm name contain the alarm information that is triggered by the threshold:
- FusionInsight Manager checks whether the value of each monitored indicator reaches the threshold. If the number of consecutive check times is equal to the value of Trigger Count, and the threshold is not reached in these checks, the system sends an alarm.
- The value can be customized. Check Period (s) indicates the interval for the system to check monitoring indicators.
- Rules for triggering an alarm.

Click Create Rule to add rules used for monitoring indicators.

**Table 1** Monitoring indicator rule parameters
Parameter	Value	Description
Rule Name	CPU_MAX (example value)	Name of a rule.
Alarm Severity	Critical Major Minor Warning	Alarm Severity Critical Major Minor Warning
Threshold Type	Max value Min value	You can select the maximum or minimum value of an indicator. Setting this parameter to Max value, the system generates an alarm when the actual value of the indicator is greater than the threshold. Setting this parameter to Min value, the system generates an alarm when the actual value of the indicator is less than the threshold.
Date	Daily Weekly Others	This parameter is used to set the date when the rule takes effect.
Add Date	09-30	This parameter is available only when Date is set to Others. You can set the date when the rule takes effect. Multiple options are available.
Thresholds	Start and End Time: 00: 00 to 08:30	This parameter is used to set the time range when the rule takes effect.
Thresholds	Threshold: 10	Specifies the threshold of the rule monitoring indicator.

For the last parameter in the table, you can click or to add or delete multiple start and end time or alarm indicator thresholds.

Click OK to save the rules.
Locate the row that contains an added rule, and click Apply in the Operation column. The value of Effective for this rule changes as Yes.

You can apply a new rule only after clicking Cancel.

Monitoring Indicator Reference

FusionInsight Manager alarm monitoring indicators are categorized into node information indicators and cluster service indicators. Table 2 describes the indicators whose thresholds can be configured on nodes. For all the monitoring indicator list, see the M.

**Table 2** Monitoring indicators on each node
Monitoring Indicator Group Name	Indicator Name	Description	Default Threshold
CPU	Host CPU Usage	This indicator reflects the computing and control capabilities of the current cluster in a measurement period. By observing the indicator value, you can better understand the overall resource usage of the cluster.	90.0%
Disk	Disk Usage	Indicates the disk usage of a host.	90.0%
Disk	Disk Inode Usage	Indicates the disk inode usage in a measurement period.	80.0%
Memory	Host Memory Usage	Indicates the average memory usage at the current time.	90.0%
Host Status	Host File Handle Usage	Indicates the usage of file handles of the host in a measurement period.	80.0%
Host Status	Host PID Usage	Indicates the PID usage of a host.	90%
Network Status	TCP Ephemeral Port Usage	Indicates the usage of temporary TCP ports of the host in a measurement period.	80.0%
Network Reading	Read Packet Error Rate	Indicates the read packet error rate of the network interface on the host in a measurement period.	0.5%
	Read Packet Dropped Rate	Indicates the read packet dropped rate of the network interface on the host in a measurement period.	0.5%
	Read Throughput Rate	Indicates the average read throughput (at MAC layer) of the network interface in a measurement period.	80%
Network Writing	Write Packet Error Rate	Indicates the write packet error rate of the network interface on the host in a measurement period.	0.5%
	Write Packet Dropped Rate	Indicates the write packet dropped rate of the network interface on the host in a measurement period.	0.5%
	Write Throughput Rate	Indicates the average write throughput (at MAC layer) of the network interface in a measurement period.	80%
Process	Uninterruptible Sleep Process	Indicates the number of D state processes on the host in a measurement period.	0
Process	omm Process Usage	Indicates the usage of the omm process within a measurement period.	90

**Table 3** Cluster service indicators
Service	Monitoring Indicator Group Name	Indicator Name	Description	Default Threshold
DBService	Database	Database Connections Usage	Indicates the usage of the number of database connections.	90%
DBService	Database	Disk Space Usage of the Data Directory	Disk space usage of the data directory.	80%
Flume	Agent	Heap Memory Usage Calculate	Indicates the Flume heap memory usage.	95.0%
		Flume Direct Memory Usage Statistics	Indicates the Flume direct memory usage.	80.0%
		Flume Non-heap Memory Usage	Indicates the Flume non-heap memory usage.	80.0%
		Total GC duration of Flume process	Indicates the Flume total GC time.	12000ms
HBase	GC	GC time for old generation	Indicates the total GC time of RegionServer.	5000ms
	GC	GC time for old generation	Indicates the total GC time of HMaster.	5000ms
	CPU and Memory	RegionServer Direct Memory Usage Statistics	Indicates the RegionServerReg direct memory usage.	90%
		RegionServer Heap Memory Usage Statistics	Indicates the RegionServer heap memory usage.	90%
		HMaster Direct Memory Usage	Indicates the HMaster direct memory usage.	90%
		HMaster Heap Memory Usage Statistics	Indicates the HMaster heap memory usage.	90%
	Service	Regions	Indicates the number of regions of a RegionServer.	2000
	Service	Region in transaction count over threshold	Number of regions that are in the RIT state and reach the threshold duration.	1
	Replication	Replication sync failed times	Indicates the number of times that DR data fails to be synchronized.	1
	Queue	Compaction Queue Size	Compaction queue size.	100
HDFS	File and Block	Lost Blocks	Number of missing copy blocks in the HDFS file system.	0
	File and Block	Blocks Under Replicated	Total number of blocks that need to be replicated by the NameNode.	1000
	RPC	Average Time of Active NameNode RPC Processing	Indicates the average RPC processing time.	100ms
	RPC	Average Time of Active NameNode RPC Queuing	Indicates the average RPC queuing time.	200ms
	Disk	Disk Usage	Indicates the HDFS disk usage.	80%
		Percentage of DataNode Capacity	Indicates the disk usage of DataNodes in the HDFS.	80%
		Percentage of Reserved Space for Replicas of Unused Space	Indicates the percentage of the reserved disk space of all the copies to the total unused disk space of DataNodes.	90%
	Resource	Faulty DataNodes	Indicates the number of faulty DataNodes.	3
		NameNode Non Heap Memory Usage Statistics	Indicates the percentage of NameNode non-heap memory usage.	90%
		NameNode Direct Memory Usage Statistics	Indicates the percentage of direct memory used by NameNodes.	90%
		NameNode Heap Memory Usage Statistics	Indicates the percentage of NameNode non-heap memory usage.	95%
		DataNode Non Heap Memory Usage Statistics	Indicates the percentage of DataNode non-heap memory usage.	90%
		DataNode Direct Memory Usage Statistics	Indicates the percentage of direct memory used by DataNodes.	90%
		DataNode Heap Memory Usage Statistics	Indicates the percentage of DataNode non-heap memory usage.	95%
	Garbage Collection	GC Time	Indicates the Garbage collection (GC) duration of NameNodes per minute.	12000ms
	Garbage Collection	GC Time	Indicates the GC duration of DataNodes per minute.	12000ms
Hive	HQL	Percentage of HQL Statements That Are Executed Successfully by Hive	Indicates the percentage of HQL statements that are executed successfully by Hive.	90.0%
	Background	Background Thread Usage	Indicates the percentage of Background thread usage.	90%
	GC	Total GC Time in Milliseconds	Indicates the total GC time of MetaStore.	12000ms
	GC	Total GC Time in Milliseconds	Indicates the total GC time of HiveServer.	12000ms
	Capacity	Percentage of HDFS Space Used by Hive to the Available Space	Indicates the percentage of HDFS space used by Hive to the available space.	85.0%
	CPU and Memory	MetaStore Direct Memory Usage Statistics	Indicates the MetaStore direct memory usage.	95%
		MetaStore Non-Heap Memory Usage Statistics	Indicates the MetaStore non-heap memory usage.	95%
		MetaStore Heap Memory Usage Statistics	Indicates the MetaStore heap memory usage.	95%
		HiveServer Direct Memory Usage Statistics	Indicates the HiveServer direct memory usage.	95%
		HiveServer Non-Heap Memory Usage Statistics	Indicates the HiveServer non-heap memory usage.	95%
		HiveServer Heap Memory Usage Statistics	Indicates the HiveServer heap memory usage.	95%
	Session	Percentage of Sessions Connected to the HiveServer to Maximum Number of Sessions Allowed by the HiveServer	Indicates the percentage of the number of sessions connected to the HiveServer to the maximum number of sessions allowed by the HiveServer.	90.0%
Kafka	Partition	Percentage of Partitions That Are Not Completely Synchronized	Indicates the percentage of partitions that are not completely synchronized to total partitions.	50%
	Other	Unavailable Partition Percentage	Disk usage of the disk where the Broker data directory is located.	40%
	Other	User Connection Usage on Broker	User connection usage on the broker.	80%
	Disk	Broker Disk Usage	Indicates the disk usage of the disk where the Broker data directory is located.	80%
	Process	Broker GC Duration per Minute	Indicates the GC duration of the Broker process per minute.	12000ms
		Heap Memory Usage of Kafka	Indicates the Kafka heap memory usage.	95%
		Kafka Direct Memory Usage	Indicates the Kafka direct memory usage.	95%
Loader	Memory	Heap Memory Usage Calculate	Indicates the Loader heap memory usage.	95%
		Loader Direct Memory Usage Statistics	Indicates the Loader direct memory usage.	80.0%
		Non heap Memory Usage Calculate	Indicates the Loader non-heap memory usage.	80%
	GC	Total GC time in milliseconds	Indicates the total GC time of Loader.	12000ms
MapReduce	Garbage Collection	GC Time	Indicates the GC time.	12000ms
	Resource	JobHistoryServer Direct Memory Usage Statistics	Indicates the JobHistoryServer direct memory usage.	90%
		JobHistoryServer Non Heap Memory Usage Statistics	Indicates the JobHistoryServer non-heap memory usage.	90%
		JobHistoryServer Heap Memory Usage Statistics	Indicates the JobHistoryServer non-heap memory usage.	95%
Oozie	Memory	Heap Memory Usage Calculate	Indicates the Oozie heap memory usage.	95.0%
		Oozie Direct Buffer Resource Percentage	Indicates the Oozie direct memory usage.	80.0%
		Non Heap Memory Usage Calculate	Indicates the Oozie non-heap memory usage.	80%
	GC	Total GC duration of Oozie process	Indicates the Oozie total GC time.	12000ms
Spark2x	Memory	JDBCServer2x Heap Memory Usage Statistics	Indicates the JDBCServer2x heap memory usage.	95%
		JDBCServer2x Direct Memory Usage Statistics	Indicates the JDBCServer2x direct memory usage.	95%
		JDBCServer2x Non-Heap Memory Usage Statistics	Indicates the JDBCServer2x non-heap memory usage.	95%
		JobHistory2x Direct Memory Usage Statistics	Indicates the JobHistory2x direct memory usage.	95%
		JobHistory2x Non-Heap Memory Usage Statistics	Indicates the JobHistory2x non-heap memory usage.	95%
		JobHistory2x Heap Memory Usage Statistics	Indicates the JobHistory2x heap memory usage.	95%
		IndexServer2x Direct Memory Usage Statistics	Indicates the IndexServer2x direct memory usage.	95%
		IndexServer2x Heap Memory Usage Statistics	ndicates the IndexServer2x heap memory usage.	95%
		IndexServer2x Non-Heap Memory Usage Statistics	Indicates the IndexServer2x non-heap memory usage.	95%
	GC number	Full GC Number of JDBCServer2x	Indicates the total GC number of JDBCServer2x.	12
		Full GC Number of JobHistory2x	Indicates the total GC number of JobHistory2x.	12
		Full GC Number of IndexServer2x	Indicates the total GC number of IndexServer2x.	12
	GC Time	Total GC time in milliseconds	Indicates the total GC time of JDBCServer2x.	12000ms
		Total GC time in milliseconds	Indicates the total GC time of JobHistory2x.	12000ms
		Total GC time in milliseconds	Indicates the total GC time of IndexServer2x.	12000ms
Storm	Cluster	Number of Available Supervisors	Indicates the number of available Supervisor processes in the cluster in a measurement period.	1
	Cluster	Slot Usage	Indicates the slot usage in the cluster in a measurement period.	80.0%
	Nimbus	Heap Memory Usage Calculate	Indicates the Nimbus heap memory usage.	80%
Yarn	Resource	NodeManager Direct Memory Usage Statistics	Indicates the percentage of direct memory used by NodeManagers.	90%
		NodeManager Heap Memory Usage Statistics	Indicates the percentage of NodeManager heap memory usage.	95%
		NodeManager Non Heap Memory Usage Statistics	Indicates the percentage of NodeManager non-heap memory usage.	90%
		ResourceManager Direct Memory Usage Statistics	Indicates the Kafka direct memory usage.	90%
		ResourceManager Heap Memory Usage Statistics	Indicates the ResourceManager heap memory usage.	95%
		ResourceManager Non Heap Memory Usage Statistics	Indicates the ResourceManager non-heap memory usage.	90%
	CPU and Memory	Pending Memory	Pending memory capacity.	83886080MB
	Other	Failed Applications of root queue	Number of failed tasks in the root queue.	50
	Other	Terminated Applications of root queue	Number of killed tasks in the root queue.	50
	Garbage collection	GC Time	Indicates the GC duration of NodeManager per minute.	12000ms
	Garbage collection	GC Time	Indicates the GC duration of ResourceManager per minute.	12000ms
	Application	Pending Applications	Pending tasks.	60
ZooKeeper	Connection	ZooKeeper Connections Usage	Indicates the percentage of the used connections to the total connections of ZooKeeper.	80%
	CPU and Memory	Heap Memory Usage Calculate	Indicates the ZooKeeper direct memory usage.	95%
	CPU and Memory	Direct Memory Usage Calculate	Indicates the ZooKeeper heap memory usage.	80%
	GC	ZooKeeper GC Duration per Minute	Indicates the GC time of ZooKeeper every minute.	12000ms
meta	OBS Meta data Operations	Average Time for Calling the OBS Metadata API	Average time for calling the OBS metadata APIs.	500ms
	OBS Meta data Operations	Success Rate for Calling the OBS Metadata API	Success rate of calling the OBS metadata APIs	99.0%
	OBS data write operation	Success Rate for Calling the OBS Write API	Success rate of calling the OBS data write APIs.	99.0%
	OBS data read operation	Success Rate for Calling the OBS Data Read API	Success rate of calling the OBS data read operation APIs.	99.0%
Ranger	GC	UserSync GC Duration	UserSync garbage collection (GC) duration.	12000ms
		RangerAdmin GC Duration	RangerAdmin garbage collection (GC) duration.	12000ms
		TagSync GC Duration	TagSync garbage collection (GC) duration.	12000ms
	CPU and Memory	UserSync Non-Heap Memory Usage	UserSync non-heap memory usage in percentage.	80.0%
		UserSync Direct Memory Usage	UserSync direct memory usage in percentage.	80.0%
		UserSync Heap Memory Usage	UserSync heap memory usage in percentage.	95.0%
		RangerAdmin Non-Heap Memory Usage	RangerAdmin non-heap memory usage.	80.0%
		RangerAdmin Heap Memory Usage	RangerAdmin heap memory usage in percentage.	95.0%
		RangerAdmin Direct Memory Usage	RangerAdmin direct memory usage.	80.0%
		TagSync Direct Memory Usage	TagSync direct memory usage in percentage.	80.0%
		TagSync Non-Heap Memory Usage	TagSync non-heap memory usage in percentage.	80.0%
		TagSync Heap Memory Usage	TagSync heap memory usage in percentage.	95.0%
ClickHouse	Cluster Quota	Clickhouse service quantity quota usage in ZooKeeper	Quota of the ZooKeeper nodes used by the ClickHouse service.	90%
ClickHouse	Cluster Quota	Capacity quota usage of the Clickhouse service in ZooKeeper	Capacity quota of ZooKeeper directory used by the ClickHouse service.	90%
IoTDB	GC	IoTDBServer GC Duration	IoTDBServer garbage collection (GC) duration.	12000ms
	CPU and Memory	IoTDBServer Heap Memory Usage	IoTDBServer heap memory usage in percentage.	90%
	CPU and Memory	IoTDBServer Direct Memory Usage	IoTDBServer direct memory usage in percentage.	90%