Help Center/ Bare Metal Server/ Best Practices/ Monitoring/ Monitored Metrics (with Agent Installed)
Updated on 2023-03-30 GMT+08:00

Monitored Metrics (with Agent Installed)

Description

This section describes monitoring metrics reported by BMS to Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the metrics of the monitored objects and alarms generated for BMS.

After installing the Agent on a BMS, you can view its OS monitoring metrics. Monitoring data is collected at an interval of 1 minute.

Namespace

SERVICE.BMS

Metrics

Supported BMS OS Monitoring metrics include CPU metrics listed in Table 1, CPU load metrics listed in Table 2, memory metrics listed in Table 3, disk metrics listed in Table 4, disk I/O metrics listed in Table 5, file system metrics listed in Table 6, NIC metrics listed in Table 7, software RAID metrics listed in Table 8, and process metrics in Table 9.

To monitor software RAID metrics, Agent 1.0.5 or later is required.

Currently, BMSs running the Windows OS cannot be monitored.

Table 1 CPU metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

cpu_usage_idle

(Agent) Idle CPU Usage

Percentage of time that CPU is idle

Check the metric value changes in the /proc/stat file in a collection period.

Run the top command to check the %Cpu(s) id value.

Unit: percent

0-100%

BMS

1 minute

cpu_usage_other

(Agent) Other Process CPU Usage

Percentage of time that the CPU is used by other processes

Formula:

Other Process CPU Usage = 1- Idle CPU Usage - Kernel Space CPU Usage - User Space CPU Usage

Unit: percent

0-100%

BMS

1 minute

cpu_usage_system

(Agent) Kernel Space CPU Usage

Percentage of time that the CPU is used by kernel space

Check the metric value changes in the /proc/stat file in a collection period.

Run the top command to check the %Cpu(s) sy value.

Unit: percent

0-100%

BMS

1 minute

cpu_usage_user

(Agent) User Space CPU Usage

Percentage of time that the CPU is used by user space

Check the metric value changes in the /proc/stat file in a collection period.

Run the top command to check the %Cpu(s) us value.

Unit: percent

0-100%

BMS

1 minute

cpu_usage

(Agent) CPU Usage

CPU usage of the monitored object

Check the metric value changes in the /proc/stat file in a collection period.

Run the top command to check the %Cpu(s) value.

Unit: percent

0-100%

BMS

1 minute

cpu_usage_nice

(Agent) Nice Process CPU Usage

Percentage of time that the CPU is used by the Nice process

Check the metric value changes in the /proc/stat file in a collection period. Run the top command to check the %Cpu(s) ni value.

Unit: percent

0-100%

BMS

1 minute

cpu_usage_iowait

(Agent) iowait Process CPU Usage

Percentage of time during which the CPU is waiting for I/O operations to complete

Check the metric value changes in the /proc/stat file in a collection period.

Run the top command to check the %Cpu(s) wa value.

Unit: percent

0-100%

BMS

1 minute

cpu_usage_irq

(Agent) CPU Interrupt Time

Percentage of time that the CPU is servicing interrupts

Check the metric value changes in the /proc/stat file in a collection period.

Run the top command to check the %Cpu(s) hi value.

Unit: percent

0-100%

BMS

1 minute

cpu_usage_softirq

(Agent) CPU Software Interrupt Time

Percentage of time that the CPU is servicing software interrupts

Check the metric value changes in the /proc/stat file in a collection period.

Run the top command to check the %Cpu(s) si value.

Unit: percent

0-100%

BMS

1 minute

Table 2 CPU load metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

load_average1

(Agent) 1-Minute Load Average

CPU load averaged from the last 1 minute

Obtain its value by dividing the load1/ value in /proc/loadavg by the number of logical CPUs.

Run the top command to check the load1 value.

≥ 0

BMS

1 minute

load_average5

(Agent) 5-Minute Load Average

CPU load averaged from the last 5 minutes

Obtain its value by dividing the load5/ value in /proc/loadavg by the number of logical CPUs.

Run the top command to check the load5 value in the /proc/loadavg file.

≥ 0

BMS

1 minute

load_average15

(Agent) 15-Minute Load Average

CPU load averaged from the last 15 minutes

Obtain its value by dividing the load15/ value in /proc/loadavg by the number of logical CPUs.

Run the top command to check the load15 value in the /proc/loadavg file.

≥ 0

BMS

1 minute

Table 3 Memory metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

mem_available

(Agent) Available Memory

Available memory size of the monitored object

Obtain the MemAvailable value by checking the file /proc/meminfo. If it is not displayed in the file:

MemAvailable = MemFree + Buffers + Cached

Unit: GB

≥ 0 GB

BMS

1 minute

mem_usedPercent

(Agent) Memory Usage

Memory usage of the monitored object

Obtain its value by checking the file /proc/meminfo. Memory Usage = (MemTotal - MemAvailable)/MemTotal

Unit: percent

0-100%

BMS

1 minute

mem_free

(Agent) Idle Memory

Amount of memory that is not being used

Obtain its value by checking the file /proc/meminfo.

Unit: GB

≥ 0 GB

BMS

1 minute

mem_buffers

(Agent) Buffer

Memory that is being used for buffers

Obtain its value by checking the file /proc/meminfo.

Run the top command to check the KiB Mem:buffers value.

Unit: GB

≥ 0 GB

BMS

1 minute

mem_cached

(Agent) Cache

Memory that is being used for file caches

Obtain its value by checking the file /proc/meminfo.

Run the top command to check the KiB Swap:cached Mem value.

Unit: GB

≥ 0 GB

BMS

1 minute

Table 4 Disk metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

mountPointPrefix_disk_free

(Agent) Available Disk Space

Available disk space of the monitored object

Run the df -h command to check the data in the Avail column.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: GB

≥ 0 GB

BMS

1 minute

mountPointPrefix_disk_total

(Agent) Disk Storage Capacity

Disk storage capacity of the monitored object

Run the df -h command to check the data in the Size column.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: GB

≥ 0 GB

BMS

1 minute

mountPointPrefix_disk_used

(Agent) Used Disk Space

Used disk space of the monitored object

Run the df -h command to check the data in the Used column.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: GB

≥ 0 GB

BMS

1 minute

mountPointPrefix_disk_usedPercent

(Agent) Disk Usage

Disk usage of the monitored object. It is calculated as follows: Disk Usage = Used Disk Space/Disk Storage Capacity.

Disk Usage = Used Disk Space/Disk Storage Capacity

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: percent

0-100%

BMS

1 minute

Table 5 Disk I/O metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

mountPointPrefix_disk_agt_read_bytes_rate

(Agent) Disks Read Rate

Volume of data read from the monitored object per second

The disk read rate is calculated by checking data changes in the sixth column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: byte/s

≥ 0 bytes/s

BMS

1 minute

mountPointPrefix_disk_agt_read_requests_rate

(Agent) Disks Read Requests

Number of read requests sent to the monitored object per second

The disk read requests are calculated by checking data changes in the fourth column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: request/s

≥ 0

BMS

1 minute

mountPointPrefix_disk_agt_write_bytes_rate

(Agent) Disks Write Rate

Volume of data written to the monitored object per second

The disk write rate is calculated by checking data changes in the tenth column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: byte/s

≥ 0 bytes/s

BMS

1 minute

mountPointPrefix_disk_agt_write_requests_rate

(Agent) Disks Write Requests

Number of write requests sent to the monitored object per second

The disk write requests are calculated by checking data changes in the eighth column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: request/s

≥ 0

BMS

1 minute

disk_readTime

(Agent) Average Read Request Time

Average amount of time that read requests have waited on the disks

The average read request time is calculated by checking data changes in the seventh column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: ms/count

≥ 0 ms/Count

BMS

1 minute

disk_writeTime

(Agent) Average Write Request Time

Average amount of time that write requests have waited on the disks

The average write request time is calculated by checking data changes in the eleventh column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: ms/count

≥ 0 ms/Count

BMS

1 minute

disk_ioUtils

(Agent) Disk I/O Usage

Disk I/O usage of the monitored object

Check the data changes in the thirteenth column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: percent

0-100%

BMS

1 minute

disk_queue_length

(Agent) Disk Queue Length

Average number of read or write requests to be processed for the monitored disk in the monitoring period

The average disk queue length is calculated by checking data changes in the fourteenth column of the corresponding device in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: count

≥ 0

BMS

1 minute

disk_write_bytes_per_operation

(Agent) Average Disk Write Size

Average number of bytes in an I/O write for the monitored disk in the monitoring period

The average disk write size is calculated by dividing the data changes in the tenth column of the corresponding device by that of the eighth column in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: KB/op

≥ 0 KB/op

BMS

1 minute

disk_read_bytes_per_operation

(Agent) Average Disk Read Size

Average number of bytes in an I/O read for the monitored disk in the monitoring period

The average disk read size is calculated by dividing the data changes in the sixth column of the corresponding device by that of the fourth column in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: KB/op

≥ 0 KB/op

BMS

1 minute

disk_io_svctm

(Agent) Disk I/O Service Time

Average time in an I/O read or write for the monitored disk in the monitoring period

The average disk I/O service time is calculated by dividing the data changes in the thirteenth column of the corresponding device by the sum of data changes in the fourth and eighth columns in the /proc/diskstats file in a collection period.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: ms/op

≥ 0 ms/op

BMS

1 minute

Table 6 File system metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

disk_fs_rwstate

(Agent) File System Read/Write Status

Read and write status of the mounted file system of the monitored object Possible values are 0 (read and write) and 1 (read only).

Check file system information in the fourth column in the /proc/mounts file.

0 and 1

BMS

1 minute

disk_inodesTotal

(Agent) Disk inode Total

Total number of index nodes on the disk Run the df -i command to check information in the Inodes column.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

≥ 0

BMS

1 minute

disk_inodesUsed

(Agent) Total inode Used

Number of used index nodes on the disk

Run the df -i command to check data in the IUsed column.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

≥ 0

BMS

1 minute

disk_inodesUsedPercent

(Agent) Percentage of Total inode Used

Percentage of used index nodes on the disk

Run the df -i command to check data in the IUse% column.

The path of the mount point prefix cannot exceed 64 characters. It must start with a letter, and contain only digits, letters, hyphens (-), dots (.), and swung dashes (~).

Unit: percent

0-100%

BMS

1 minute

Table 7 NIC metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

net_bitRecv

(Agent) Inbound Bandwidth

Number of bits received by this NIC per second

Check metric value changes in the /proc/net/dev file in a collection period.

Unit: bit/s

≥ 0 bit/s

BMS

1 minute

net_bitSent

(Agent) Outbound Bandwidth

Number of bits sent by this NIC per second

Check metric value changes in the /proc/net/dev file in a collection period.

Unit: bit/s

≥ 0 bit/s

BMS

1 minute

net_packetRecv

(Agent) NIC Packet Receive Rate

Number of packets received by this NIC per second

Check metric value changes in the /proc/net/dev file in a collection period.

Unit: count/s

≥ 0 counts/s

BMS

1 minute

net_packetSent

(Agent) NIC Packet Send Rate

Number of packets sent by this NIC per second

Check metric value changes in the /proc/net/dev file in a collection period.

Unit: count/s

≥ 0 counts/s

BMS

1 minute

net_errin

(Agent) Receive Error Rate

Percentage of receive errors detected by this NIC per second

Unit: percent

0-100%

BMS

1 minute

net_errout

(Agent) Transmit Error Rate

Percentage of transmit errors detected by this NIC per second

Check metric value changes in the /proc/net/dev file in a collection period.

Unit: percent

0-100%

BMS

1 minute

net_dropin

(Agent) Received Packet Drop Rate

Percentage of packets discarded by this NIC to the total number of packets received by the NIC per second

Check metric value changes in the /proc/net/dev file in a collection period.

Unit: percent

0-100%

BMS

1 minute

net_dropout

(Agent) Transmitted Packet Drop Rate

Percentage of packets transmitted by this NIC which were dropped per second

Check metric value changes in the /proc/net/dev file in a collection period.

Unit: percent

0-100%

BMS

1 minute

Table 8 Software RAID metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

md1_status_device:1

(Agent) Status

Software RAID status of the monitored object. Its value is 0 if the RAID is abnormal.

Run the plug-in script /usr/local/telescope/plugins/raid-monitor.sh in a collection period. Obtain its value by checking data changes in the /proc/mdstat file and run mdadm -D/dev/md0 (md0 indicates the RAID name).

0 and 1

BMS

1 minute

md1_active_device:2

(Agent) Active Disks

Number of active disks in software RAID of the monitored object. Its value is -1 if the RAID is abnormal.

Run the plug-in script /usr/local/telescope/plugins/raid-monitor.sh in a collection period. Obtain its value by checking data changes in the /proc/mdstat file and run mdadm -D/dev/md0 (md0 indicates the RAID name).

≥ 0, –1

BMS

1 minute

md1_working_device:2

(Agent) Working Disks

Number of working disks in software RAID of the monitored object. Its value is -1 if the RAID is abnormal.

Run the plug-in script /usr/local/telescope/plugins/raid-monitor.sh in a collection period. Obtain its value by checking data changes in the /proc/mdstat file and run mdadm -D/dev/md0 (md0 indicates the RAID name).

≥ 0, –1

BMS

1 minute

md1_failed_device:0

(Agent) Failed Disks

Number of failed disks in software RAID of the monitored object. Its value is -1 if the RAID is abnormal.

Run the plug-in script /usr/local/telescope/plugins/raid-monitor.sh in a collection period. Obtain its value by checking data changes in the /proc/mdstat file and run mdadm -D/dev/md0 (md0 indicates the RAID name).

≥ 0, –1

BMS

1 minute

md1_spare_device:0

(Agent) Spare Disks

Number of spare disks in software RAID of the monitored object. Its value is -1 if the RAID is abnormal.

Run the plug-in script /usr/local/telescope/plugins/raid-monitor.sh in a collection period. Obtain its value by checking data changes in the /proc/mdstat file and run mdadm -D/dev/md0 (md0 indicates the RAID name).

≥ 0, –1

BMS

1 minute

Table 9 Process metrics

Metric ID

Metric

Description

Value Range

Monitored Object

Monitoring Interval (Raw Data)

proc_pHashId_cpu

CPU Usage

CPU consumed by a process. pHashId (process name and process ID) is the value of md5.

Check the metric value changes in the /proc/pid/stat file.

Unit: percent

0-100%

BMS

1 minute

proc_pHashId_mem

Memory Usage

Memory consumed by a process. pHashId (process name and process ID) is the value of md5.

Memory Usage = RSS x PAGESIZE/MemTotal

  • Obtain the RSS value by checking the second column of the file /proc/pid/statm.
  • Obtain the PAGESIZE value by running the getconf PAGESIZE command.
  • Obtain the MemTotal value by checking the file /proc/meminfo.

Unit: percent

0-100%

BMS

1 minute

proc_pHashId_file

Opened Files

Number of files opened by a process. pHashId (process name and process ID) is the value of md5.

Run the ls -l /proc/pid/fd command to view the number of opened files.

≥0

BMS

1 minute

proc_running_count

(Agent) Running Processes

Number of running processes

You can obtain the status of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.

≥0

BMS

1 minute

proc_idle_count

(Agent) Idle Processes

Number of idle processes

You can obtain the status of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.

≥0

BMS

1 minute

proc_zombie_count

(Agent) Zombie Processes

Number of zombie processes

You can obtain the status of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.

≥0

BMS

1 minute

proc_blocked_count

(Agent) Blocked Processes

Number of blocked processes

You can obtain the status of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.

≥0

BMS

1 minute

proc_sleeping_count

(Agent) Sleeping Processes

Number of sleeping processes

You can obtain the status of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.

≥0

BMS

1 minute

proc_total_count

(Agent) Total Processes

Total number of processes on the monitored object

You can obtain the status of each process by checking the Status value in the /proc/pid/status file, and then collect the total number of processes in each state.

≥0

BMS

1 minute