HBase Cluster Supported Metrics

Description

Monitoring is critical to ensure CloudTable reliability, availability, and performance. You can monitor the running status of CloudTable servers.

This section describes the metrics that can be monitored by CES as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored objects and alarms generated for CloudTable.

Namespace

SYS.CloudTable

CloudTable HBase HMaster Instance Monitoring Metrics

**Table 1** CloudTable HBase HMaster instance monitoring metrics
Metric ID	Name	Meaning	Value Range	Monitoring Interval (Raw Data)
disk_throughput_write_rate	Disks Read Rate	Volume of data read from the monitored object per second	≥ 0 bytes/s	1 min
disk_throughput_read_rate	Disks Write Rate	Volume of data written to the monitored object per second	≥ 0 bytes/s	1 min
cmdForTotalMemory	Total Memory	Total memory size of the monitored object	> 0 Byte	1 min
cmdProcessCPU	CPU Usage	CPU usage of the monitored object	0%–100%	1 min
cmdProcessMem	Memory Usage	Memory usage of the monitored object	0%–100%	1 min
hm_deadregionservernum	Faulty RegionServers	Number of faulty RegionServers in the cluster	≥ 0	1 min
hm_regionservernum	Normal RegionServers	Number of normal RegionServers in the cluster	≥ 0	1 min
hm_ritCount	RIT Count	Number of regions in the Region In Transaction (RIT) state in the cluster where the monitored object is located	≥ 0	1 min
hm_ritCountOverThreshold	RIT Count Over Threshold	Number of regions in the RIT state and reach the threshold in the cluster where the monitored object is running	≥ 0	1 min
rs_queuecalltime_max	RPC Queue Call Time (Max)	Maximum RPC queue call time	≥ 0 ms	1 min
rs_queuecalltime_mean	RPC Queue Call Time (Mean)	Mean RPC queue call time	≥ 0 ms	1 min
nn_percentallused	Disk Utilization Rate	Disk space usage of the cluster	0%–100%	1 min
nn_capacityremaining	Disk capacity remaining of cluster	Remaining disk space of the cluster	Depends on the cluster disk capacity.	1 min
nn_capacityused	Disk capacity used of cluster	Disk space used in the cluster	Depends on the cluster disk capacity.	1 min

hmaster instances include hmaster-standby (standby) and hmaster-active (active). When hmaster-active becomes faulty, hmaster-standby becomes active to provide services.

CloudTable HBase RegionServer Instance Monitoring Metrics

Table 2 lists the monitoring metrics supported by CloudTable HBase RegionServer instances.

**Table 2** Monitored CloudTable metrics
Metric ID	Metric	Meaning	Value Range	Monitoring Period (Raw Data)
cmdProcessCPU	CPU Usage	CPU usage of the monitored object Unit: %	0%–100%	1 minute
cmdForTotalMemory	Total Memory	Total memory size of the monitored object Unit: byte	> 0 byte	1 minute
cmdProcessMem	Memory Usage	Memory usage of the monitored object Unit: %	0%–100%	1 minute
disk_throughput_write_rate	Disks Write Rate	Volume of data written to the monitored object per second Unit: byte/s	≥ 0 bytes/s	1 minute
disk_throughput_read_rate	Disks Read Rate	Volume of data read from the monitored object per second Unit: byte/s	≥ 0 bytes/s	1 minute
hm_regionservernum	Normal RegionServers	Number of normal RegionServers	≥ 0	1 minute
hm_deadregionservernum	Faulty RegionServers	Number of faulty RegionServers	≥ 0	1 minute
hm_ritCountOverThreshold	RIT Count Over Threshold	Region in transaction count over threshold	≥ 0	1 minute
hm_ritCount	RIT Count	Region in transaction count	≥ 0	1 minute
rs_requests	Requests Per Second	Number of requests of a RegionServer per second Unit: Request/s	≥ 0 requests/s	1 minute
rs_regions	Regions	Number of regions of a RegionServer	≥ 0	1 minute
rs_writerequestscount	Write Requests	Number of write requests of a RegionServer	≥ 0	1 minute
rs_readrequestscount	Read Requests	Number of read requests of a RegionServer	≥ 0	1 minute
rs_blockcachehitcachingratio	Hit Cache Block Caching Ratio	Block cache hit caching ratio Unit: %	0%–100%	1 minute
rs_blockCacheCountHitPercent	Hit Cache Block Ratio	Block cache hit ratio Unit: %	0%–100%	1 minute
rs_getavgtime	Get Delay (Avg)	Average Get operation delay of the RegionServer per unit time Unit: millisecond	≥ 0 ms	1 minute
rs_putavgtime	Put Delay (Avg)	Average Put operation delay of the RegionServer per unit time Unit: millisecond	≥ 0 ms	1 minute
rs_deleteavgtime	Delete Delay (Avg)	Average Delete operation delay of the RegionServer per unit time Unit: millisecond	≥ 0 ms	1 minute
rs_getnumops	Get Operations	Number of Get operations of the RegionServer per unit time	≥ 0	1 minute
rs_putnumops	Put Operations	Number of Put operations of the RegionServer per unit time	≥ 0	1 minute
rs_deletenumops	Delete Operations	Number of Delete operations of the RegionServer per unit time	≥ 0	1 minute
rs_queuecalltime_max	RPC Queue Call Time (Max)	Maximum RPC queue call time Unit: millisecond	≥ 0 ms	1 minute
rs_queuecalltime_mean	RPC Queue Call Time (Mean)	Mean RPC queue call time Unit: millisecond	≥ 0 ms	1 minute
rs_flushtime_mean	Flush Time(Mean)	Mean time of flush Unit: millisecond	≥ 0 ms	1 minute
rs_compactionqueuesize	Compaction Queue Size	Point in time length of the compaction queue. The number of Stores for compaction in the RegionServer.	≥ 0	1 minute
rs_flushqueuesize	Flush Queue Size	Flush queue size	≥ 0	1 minute
rs_compactionscompletedcount	Compaction Count	Count of compaction	≥ 0	1 minute
rs_flushtimeops_num	Flush Operation Count	Count of flush operation	≥ 0	1 minute
rs_blockcacheevictedcount	Discarded Cache Blocks	Block cache evict count	≥ 0	1 minute
rs_syncTime_max	Sync WAL Time(Max)	Maximum time it took to sync the WAL to HDFS Unit: millisecond	≥ 0 ms	1 minute
rs_syncTime_mean	Sync WAL Time(Mean)	Mean time it took to sync the WAL to HDFS Unit: millisecond	≥ 0 ms	1 minute
dn_byteswritten_speed	Bytes written per second	Bytes written per second of the node	≥ 0 byte	1 min
dn_bytesread_speed	Bytes read per second	Bytes read per second of the node	≥ 0 byte	1 min
rs_numActiveHandler	Number of RegionServer Active Handlers	Number of active RegionServer handlers (total number of handlers for processing user table requests, meta table requests, and replication requests)	≥ 0	1 min
rs_numActiveGeneralHandler	Number of RegionServer Active Handlers for Processing User Table Requests	Number of active RegionServer handlers for processing user table requests	≥ 0	1 min
rs_scanTime_p999	99.9th Percentile of the Scan Operation Delay	99.9th percentile of the RegionServer Scan operation delay	≥ 0 ms	1 min
rs_syncTime_p999	99.9th Percentile of the WAL Sync Operation Delay	99.9th percentile of the RegionServer WAL Sync operation delay	≥ 0 ms	1 min
rs_Get_99th_percentile	99th Percentile of the Get Operation Delay	99th percentile of the RegionServer Get operation delay	≥ 0 ms	1 min
rs_Put_99th_percentile	99th Percentile of the Put Operation Delay	99th percentile of the RegionServer Put operation delay	≥ 0 ms	1 min
rs_Delete_99th_percentile	99th Percentile of the Delete Operation Delay	99th percentile of the RegionServer Delete operation delay	≥ 0 ms	1 min
rs_Get_999th_percentile	99.9th Percentile of the Get Operation Delay	99.9th percentile of the RegionServer Get operation delay	≥ 0 ms	1 min
rs_Put_999th_percentile	99.9th Percentile of the Put Operation Delay	99.9th percentile of the RegionServer Put operation delay	≥ 0 ms	1 min
rs_Delete_999th_percentile	99.9th Percentile of the Delete Operation Delay	99.9th percentile of the RegionServer Delete operation delay	≥ 0 ms	1 min