Updated on 2024-07-23 GMT+08:00

HBase Cluster Supported Metrics

Description

Monitoring is critical to ensure CloudTable reliability, availability, and performance. You can monitor the running status of CloudTable servers.

This section describes the metrics that can be monitored by CES as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored objects and alarms generated for CloudTable.

Namespace

SYS.CloudTable

CloudTable HBase HMaster Instance Monitoring Metrics

Table 1 CloudTable HBase HMaster instance monitoring metrics

Metric ID

Name

Meaning

Value Range

Monitoring Interval (Raw Data)

cmdForIORead

Disks Read Rate

Volume of data read from the monitored object per second

≥ 0 bytes/s

1 min

cmdForIOWrite

Disks Write Rate

Volume of data written to the monitored object per second

≥ 0 bytes/s

1 min

cmdForTotalMemory

Total Memory

Total memory size of the monitored object

> 0 Byte

1 min

cmdProcessCPU

CPU Usage

CPU usage of the monitored object

0%–100%

1 min

cmdProcessMem

Memory Usage

Memory usage of the monitored object

0%–100%

1 min

hm_deadregionservernum

Faulty RegionServers

Number of faulty RegionServers in the cluster

≥ 0

1 min

hm_regionservernum

Normal RegionServers

Number of normal RegionServers in the cluster

≥ 0

1 min

hm_ritCount

RIT Count

Number of regions in the Region In Transaction (RIT) state in the cluster where the monitored object is located

≥ 0

1 min

hm_ritCountOverThreshold

RIT Count Over Threshold

Number of regions in the RIT state and reach the threshold in the cluster where the monitored object is running

≥ 0

1 min

rs_queuecalltime_max

RPC Queue Call Time (Max)

Maximum RPC queue call time

≥ 0 ms

1 min

rs_queuecalltime_mean

RPC Queue Call Time (Mean)

Mean RPC queue call time

≥ 0 ms

1 min

nn_percentallused

Disk Utilization Rate

Disk space usage of the cluster

0%–100%

1 min

nn_capacityremaining

Disk capacity remaining of cluster

Remaining disk space of the cluster

Depends on the cluster disk capacity.

1 min

nn_capacityused

Disk capacity used of cluster

Disk space used in the cluster

Depends on the cluster disk capacity.

1 min

hmaster instances include hmaster-standby (standby) and hmaster-active (active). When hmaster-active becomes faulty, hmaster-standby becomes active to provide services.

CloudTable HBase RegionServer Instance Monitoring Metrics

Table 2 lists the monitoring metrics supported by CloudTable HBase RegionServer instances.

Table 2 Monitored CloudTable metrics

Metric ID

Metric

Meaning

Value Range

Monitoring Period (Raw Data)

cmdProcessCPU

CPU Usage

CPU usage of the monitored object

Unit: %

0%–100%

1 minute

cmdForTotalMemory

Total Memory

Total memory size of the monitored object

Unit: byte

> 0 byte

1 minute

cmdProcessMem

Memory Usage

Memory usage of the monitored object

Unit: %

0%–100%

1 minute

cmdForIOWrite

Disks Write Rate

Volume of data written to the monitored object per second

Unit: byte/s

≥ 0 bytes/s

1 minute

cmdForIORead

Disks Read Rate

Volume of data read from the monitored object per second

Unit: byte/s

≥ 0 bytes/s

1 minute

hm_regionservernum

Normal RegionServers

Number of normal RegionServers

≥ 0

1 minute

hm_deadregionservernum

Faulty RegionServers

Number of faulty RegionServers

≥ 0

1 minute

hm_ritCountOverThreshold

RIT Count Over Threshold

Region in transaction count over threshold

≥ 0

1 minute

hm_ritCount

RIT Count

Region in transaction count

≥ 0

1 minute

rs_requests

Requests Per Second

Number of requests of a RegionServer per second

Unit: Request/s

≥ 0 requests/s

1 minute

rs_regions

Regions

Number of regions of a RegionServer

≥ 0

1 minute

rs_writerequestscount

Write Requests

Number of write requests of a RegionServer

≥ 0

1 minute

rs_readrequestscount

Read Requests

Number of read requests of a RegionServer

≥ 0

1 minute

rs_blockcachehitcachingratio

Hit Cache Block Caching Ratio

Block cache hit caching ratio

Unit: %

0%–100%

1 minute

rs_blockCacheCountHitPercent

Hit Cache Block Ratio

Block cache hit ratio

Unit: %

0%–100%

1 minute

rs_getavgtime

Get Delay (Avg)

Average Get operation delay of the RegionServer per unit time

Unit: millisecond

≥ 0 ms

1 minute

rs_putavgtime

Put Delay (Avg)

Average Put operation delay of the RegionServer per unit time

Unit: millisecond

≥ 0 ms

1 minute

rs_deleteavgtime

Delete Delay (Avg)

Average Delete operation delay of the RegionServer per unit time

Unit: millisecond

≥ 0 ms

1 minute

rs_getnumops

Get Operations

Number of Get operations of the RegionServer per unit time

≥ 0

1 minute

rs_putnumops

Put Operations

Number of Put operations of the RegionServer per unit time

≥ 0

1 minute

rs_deletenumops

Delete Operations

Number of Delete operations of the RegionServer per unit time

≥ 0

1 minute

rs_queuecalltime_max

RPC Queue Call Time (Max)

Maximum RPC queue call time

Unit: millisecond

≥ 0 ms

1 minute

rs_queuecalltime_mean

RPC Queue Call Time (Mean)

Mean RPC queue call time

Unit: millisecond

≥ 0 ms

1 minute

rs_flushtime_mean

Flush Time(Mean)

Mean time of flush

Unit: millisecond

≥ 0 ms

1 minute

rs_compactionqueuesize

Compaction Queue Size

Point in time length of the compaction queue. The number of Stores for compaction in the RegionServer.

≥ 0

1 minute

rs_flushqueuesize

Flush Queue Size

Flush queue size

≥ 0

1 minute

rs_compactionscompletedcount

Compaction Count

Count of compaction

≥ 0

1 minute

rs_flushtimeops_num

Flush Operation Count

Count of flush operation

≥ 0

1 minute

rs_blockcacheevictedcount

Discarded Cache Blocks

Block cache evict count

≥ 0

1 minute

rs_syncTime_max

Sync WAL Time(Max)

Maximum time it took to sync the WAL to HDFS

Unit: millisecond

≥ 0 ms

1 minute

rs_syncTime_mean

Sync WAL Time(Mean)

Mean time it took to sync the WAL to HDFS

Unit: millisecond

≥ 0 ms

1 minute

dn_byteswritten_speed

Bytes written per second

Bytes written per second of the node

≥ 0 byte

1 min

dn_bytesread_speed

Bytes read per second

Bytes read per second of the node

≥ 0 byte

1 min

rs_numActiveHandler

Number of RegionServer Active Handlers

Number of active RegionServer handlers (total number of handlers for processing user table requests, meta table requests, and replication requests)

≥ 0

1 min

rs_numActiveGeneralHandler

Number of RegionServer Active Handlers for Processing User Table Requests

Number of active RegionServer handlers for processing user table requests

≥ 0

1 min

rs_scanTime_p999

99.9th Percentile of the Scan Operation Delay

99.9th percentile of the RegionServer Scan operation delay

≥ 0 ms

1 min

rs_syncTime_p999

99.9th Percentile of the WAL Sync Operation Delay

99.9th percentile of the RegionServer WAL Sync operation delay

≥ 0 ms

1 min

rs_Get_99th_percentile

99th Percentile of the Get Operation Delay

99th percentile of the RegionServer Get operation delay

≥ 0 ms

1 min

rs_Put_99th_percentile

99th Percentile of the Put Operation Delay

99th percentile of the RegionServer Put operation delay

≥ 0 ms

1 min

rs_Delete_99th_percentile

99th Percentile of the Delete Operation Delay

99th percentile of the RegionServer Delete operation delay

≥ 0 ms

1 min

rs_Get_999th_percentile

99.9th Percentile of the Get Operation Delay

99.9th percentile of the RegionServer Get operation delay

≥ 0 ms

1 min

rs_Put_999th_percentile

99.9th Percentile of the Put Operation Delay

99.9th percentile of the RegionServer Put operation delay

≥ 0 ms

1 min

rs_Delete_999th_percentile

99.9th Percentile of the Delete Operation Delay

99.9th percentile of the RegionServer Delete operation delay

≥ 0 ms

1 min

Dimension

Key

Value

cluster_id

CloudTable cluster ID.

instance_name

Name of a CloudTable cluster node.