HBase Cluster Monitoring Metrics

Description

Monitoring is critical to ensure CloudTable reliability, availability, and performance. You can monitor the running status of CloudTable servers.

This section describes the metrics that can be monitored by Cloud Eye as well as their namespaces and dimensions.

Namespace

SYS.CloudTable

CloudTable HBase HMaster Instance Monitoring Metrics

**Table 1** CloudTable HBase HMaster instance monitoring metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Monitored Object (Dimension)	Monitoring Interval (Raw Data)
cmdForTotalMemory	Total Memory	Total memory size of the monitored object	> 0	Byte	1024 (IEC)	CloudTable instance node	1 min
cmdProcessCPU	CPU Utilization	CPU utilization of the monitored object	0~100	%	N/A	CloudTable instance node	1 min
cmdProcessMem	Memory Utilization	Memory utilization of the monitored object	0~100	%	N/A	CloudTable instance node	1 min
hm_deadregionservernum	Faulty RegionServers	Number of faulty RegionServers in the cluster	≥ 0	Count	N/A	CloudTable instance node	1 min
hm_regionservernum	Normal RegionServers	Number of normal RegionServers in the cluster	≥ 0	Count	N/A	CloudTable instance node	1 min
hm_ritCount	RIT Count	Number of regions in the Region In Transaction (RIT) state in the cluster where the monitored object is located	≥ 0	Count	N/A	CloudTable instance node	1 min
hm_ritCountOverThreshold	RIT Count Over Threshold	Number of regions in the RIT state and reach the threshold in the cluster where the monitored object is running	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_queuecalltime_max	RPC Queue Call Time (Max)	Maximum RPC queue call time	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_queuecalltime_mean	RPC Queue Call Time (Mean)	Mean RPC queue call time	≥ 0	ms	N/A	CloudTable instance node	1 min
nn_percentallused	Disk Utilization Rate	Disk space usage of the cluster	0~100	%	N/A	CloudTable instance node	1 min
nn_capacityremaining	Disk capacity remaining of cluster	Remaining disk space of the cluster	Depends on the cluster disk capacity.	GB	N/A	CloudTable instance node	1 min
nn_capacityused	Disk capacity used of cluster	Disk space used in the cluster	Depends on the cluster disk capacity.	GB	N/A	CloudTable instance node	1 min
cmdForUsedStorageRate	Ratio of Used Storage Space	Ratio of the used storage space to the total storage space in the cluster	0~100	%	N/A	CloudTable instance node	1 min
network_throughput_inbound_rate	Inbound Throughput	Inbound data volume over network of each node per second	≥ 0	KB/s	N/A	CloudTable instance node	1 min
network_throughput_outgoing_rate	Outbound Throughput	Outbound data volume over network of each node per second	≥ 0	KB/s	N/A	CloudTable instance node	1 min
disk_throughput_read_rate	Disk Read Throughput	Disk read throughput	≥ 0	Byte/s	1024 (IEC)	CloudTable instance node	1 min
disk_throughput_write_rate	Disk Write Throughput	Disk write throughput	≥ 0	Byte/s	1024 (IEC)	CloudTable instance node	1 min

hmaster instances include hmaster-standby (standby) and hmaster-active (active). When hmaster-active becomes faulty, hmaster-standby becomes active to provide services.

In an HBase cluster, 10% of the disk space is reserved by default. Therefore, the disk alarm value is not equivalent to the raw disk usage percentage.

CloudTable HBase RegionServer Instance Monitoring Metrics

Table 2 lists the monitoring metrics supported by CloudTable HBase RegionServer instances.

**Table 2** Monitored CloudTable metrics
Metric ID	Metric Name	Description	Value Range	Unit	Conversion Rule	Monitored Object (Dimension)	Monitoring Interval (Raw Data)
cmdProcessCPU	CPU Utilization	CPU utilization of the monitored object	0~100	%	N/A	CloudTable instance node	1 min
cmdForTotalMemory	Total Memory	Total memory size of the monitored object	> 0	Byte	1024 (IEC)	CloudTable instance node	1 min
cmdProcessMem	Memory Utilization	Memory utilization of the monitored object	0~100	%	N/A	CloudTable instance node	1 min
disk_throughput_write_rate	Disks Write Rate	Volume of data written to the monitored object per second	≥ 0	Byte/s	1024 (IEC)	CloudTable instance node	1 min
disk_throughput_read_rate	Disks Read Rate	Volume of data read from the monitored object per second	≥ 0	Byte/s	1024 (IEC)	CloudTable instance node	1 min
hm_regionservernum	Normal RegionServers	Number of normal RegionServers	≥ 0	Count	N/A	CloudTable instance node	1 min
hm_deadregionservernum	Faulty RegionServers	Number of faulty RegionServers	≥ 0	Count	N/A	CloudTable instance node	1 min
hm_ritCountOverThreshold	RIT Count Over Threshold	Region in transaction count over threshold	≥ 0	Count	N/A	CloudTable instance node	1 min
hm_ritCount	RIT Count	Region in transaction count	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_requests	Requests Per Second	Number of requests of a RegionServer per second	≥ 0	requests/s	N/A	CloudTable instance node	1 min
rs_regions	Regions	Number of regions of a RegionServer	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_writerequestscount	Write Requests	Number of write requests of a RegionServer	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_readrequestscount	Read Requests	Number of read requests of a RegionServer	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_blockcachehitcachingratio	Hit Cache Block Caching Ratio	Block cache hit caching ratio	0~100	%	N/A	CloudTable instance node	1 min
rs_blockCacheCountHitPercent	Hit Cache Block Ratio	Block cache hit ratio	0~100	%	N/A	CloudTable instance node	1 min
rs_getavgtime	Get Delay (Avg)	Average Get operation delay of the RegionServer per unit time	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_putavgtime	Put Delay (Avg)	Average Put operation delay of the RegionServer per unit time	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_deleteavgtime	Delete Delay (Avg)	Average Delete operation delay of the RegionServer per unit time	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_getnumops	Get Operations	Number of Get operations of the RegionServer per unit time	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_putnumops	Put Operations	Number of Put operations of the RegionServer per unit time	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_deletenumops	Delete Operations	Number of Delete operations of the RegionServer per unit time	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_queuecalltime_max	RPC Queue Call Time (Max)	Maximum RPC queue call time	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_queuecalltime_mean	RPC Queue Call Time (Mean)	Mean RPC queue call time	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_flushtime_mean	Flush Time(Mean)	Mean time of flush	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_compactionqueuesize	Compaction Queue Size	Point in time length of the compaction queue. The number of Stores for compaction in the RegionServer.	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_flushqueuesize	Flush Queue Size	Flush queue size	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_compactionscompletedcount	Compaction Count	Count of compaction	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_flushtimeops_num	Flush Operation Count	Count of flush operation NOTE: Flush Operation Count belongs to a counter type. When its value reaches the upper limit, the counter wraps around and starts counting from zero. After a cluster is restarted, Flush Operation Count will also be cleared and recalculated.	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_blockcacheevictedcount	Discarded Cache Blocks	Block cache evict count	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_syncTime_max	Sync WAL Time(Max)	Maximum time it took to sync the WAL	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_syncTime_mean	Sync WAL Time(Mean)	Mean time it took to sync the WAL	≥ 0	ms	N/A	CloudTable instance node	1 min
dn_byteswritten_speed	Bytes written per second	Bytes written per second of the node	≥ 0	Byte	1024 (IEC)	CloudTable instance node	1 min
dn_bytesread_speed	Bytes read per second	Bytes read per second of the node	≥ 0	Byte	1024 (IEC)	CloudTable instance node	1 min
rs_numActiveHandler	Number of RegionServer Active Handlers	Number of active RegionServer handlers (total number of handlers for processing user table requests, meta table requests, and replication requests)	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_numActiveGeneralHandler	Number of RegionServer Active Handlers for Processing User Table Requests	Number of active RegionServer handlers for processing user table requests	≥ 0	Count	N/A	CloudTable instance node	1 min
rs_scanTime_p999	99.9th Percentile of the Scan Operation Delay	99.9th percentile of the RegionServer Scan operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_syncTime_p999	99.9th Percentile of the WAL Sync Operation Delay	99.9th percentile of the RegionServer WAL Sync operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_Get_99th_percentile	99th Percentile of the Get Operation Delay	99th percentile of the RegionServer Get operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_Put_99th_percentile	99th Percentile of the Put Operation Delay	99th percentile of the RegionServer Put operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_Delete_99th_percentile	99th Percentile of the Delete Operation Delay	99th percentile of the RegionServer Delete operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_Get_999th_percentile	99.9th Percentile of the Get Operation Delay	99.9th percentile of the RegionServer Get operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_Put_999th_percentile	99.9th Percentile of the Put Operation Delay	99.9th percentile of the RegionServer Put operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min
rs_Delete_999th_percentile	99.9th Percentile of the Delete Operation Delay	99.9th percentile of the RegionServer Delete operation delay	≥ 0	ms	N/A	CloudTable instance node	1 min

Dimension

Key	Value
cluster_id	CloudTable cluster ID. To obtain the value, go to the cluster management page, click the cluster name, access its details page, and obtain the cluster ID in the cluster information area.
instance_name	Name of a CloudTable cluster node. To obtain the value, go to the cluster management page, click the cluster name, access its details page, and obtain the value of instance_name.

Key

Value

cluster_id

CloudTable cluster ID.

To obtain the value, go to the cluster management page, click the cluster name, access its details page, and obtain the cluster ID in the cluster information area.

instance_name

Name of a CloudTable cluster node.

To obtain the value, go to the cluster management page, click the cluster name, access its details page, and obtain the value of instance_name.

Parent topic: Using Cloud Eye to Monitor HBase Clusters

Previous topic: Using Cloud Eye to Monitor HBase Clusters

Next topic: Setting Alarm Rules for an HBase Cluster