Updated on 2023-10-18 GMT+08:00

Alarms

The IoT platform generates an alarm when it detects that the alarm triggering condition set in a rule is met or the device message reporting rate exceeds the threshold preset on the platform. Pay close attention to the alarms and handle them in a timely manner to ensure the normal device running.

Alarms are classified into rule alarms, device flow control alarms, and custom metric alarms.
  • Rule alarms: If you set the action Report alarms when configuring a device linkage rule and define the alarm properties and severity, the platform reports an alarm when the trigger condition is met. For example, if a smart water meter does not report data for three consecutive days, the platform generates an alarm to notify maintenance personnel of the water meter fault. Maintenance personnel then locate the faulty water meter based on the alarm information and repair it promptly.
  • System alarms: When some resources of a user, for example, the number of devices, reach the upper limit of the user quota, the IoTDA platform reports a system alarm to the AOM. This type of alarm is automatically triggered by the IoTDA platform, but notification rules need to be configured. Table 1 lists the system alarms.
    Table 1 System alarms

    Alarm

    Description

    Device Upstream Message Flow Control Alarm

    When the total reporting rate of all device messages in a resource space exceeds the threshold (120 messages per second by default), the platform starts flow control on the NB-IoT devices and generates this alarm. (By default, flow control is disabled.)

    CoAP Message Flow Control for a Single Device

    When the number of messages received by the platform from an NB-IoT device in a measurement period exceeds the threshold (300 messages per minute by default), the platform starts flow control on the NB-IoT device and generates this alarm.

    High CoAP Message Sending Rate of a Single Device

    When the number of requests sent by an NB-IoT device in a measurement period exceeds the threshold (5 messages per second by default), the platform generates this alarm and rejects subsequent requests from the device.

    MQTT Message Flow Control for a Single Device

    When the volume of data sent by an MQTT device per second exceeds the threshold (3 KB/s by default), the platform starts flow control on the MQTT device and generates this alarm.

    Device Upstream Messages Exceeding the Tenant Flow Control Alarm

    The sum of the upstream message rate and link setup rate exceeds the threshold. (PUBLISH indicates upstream message, and CONNECT indicates link setup.) By default, the rate of upstream messages is 500 messages per second in the basic edition, and the rate of link setup is 100 messages per second in the basic edition. For details about the standard and enterprise editions, see Specifications. If the rate exceeds the default value, flow control will be performed and an alarm will be generated.

    Number of User Devices Reaching the Threshold

    This alarm is generated when the number of registered user devices reaches 80% or 100% of the instance threshold (50,000 for the basic edition, and 20 times of the number of online devices for the standard or enterprise edition. For details, see Specifications).

    Number of Online User Devices Reaching the Threshold

    This alarm is generated when the number of online user devices reaches 80% or 100% of the threshold. (The threshold depends on the number of purchased units. For the standard or enterprise edition, see Specifications.) When the number of online user devices exceeds the threshold, device access is rejected. The alarm is triggered once an hour.

    Number of Child Devices Under a Gateway Reaching the Threshold

    This alarm is generated when the number of child devices under a gateway reaches 80% or 100% of the threshold.

    Linkage Rule Triggering Concurrency Threshold

    This alarm is generated when the number of linkage rules triggered per second exceeds the threshold (10/s for the basic or standard edition and 100/s for the enterprise edition), and flow control is triggered on the excess part. This alarm is triggered only once a day.

    Number of API Calls from a Tenant Reaching the Flow Control Threshold

    This alarm is generated when the TPS of API calls made by a tenant exceeds the threshold. (Unless otherwise specified, the default limit of an API is 50/s. Maximum number of API calls made by an account per second: 100/s for the basic and standard editions.) Flow control is triggered on the excess part. This alarm is triggered only once a day.

  • Custom metric alarms: You can log in to the AOM console to configure custom metric alarms. Currently, the following metrics are supported.
    Table 2 Custom alarm metrics

    Metric

    Name

    Total number of devices

    iotda_device_status_totalCount

    Number of online devices

    iotda_device_status_onlineCount

    Number of offline devices

    iotda_device_status_offlineCount

    Number of abnormal devices

    iotda_device_status_abnormalCount

    Number of inactive devices

    iotda_device_status_inactiveCount

    Total number of reported NB-IoT data records

    iotda_south_dataReport_totalCount

    Number of NB-IoT data reporting failures

    iotda_south_dataReport_failedCount

    Total number of MQTT event reporting times

    iotda_south_eventUp_totalCount

    Number of MQTT event reporting successes

    iotda_south_eventUp_successCount

    Number of MQTT event reporting failures

    iotda_south_eventUp_failedCount

    Total number of MQTT property reporting times

    iotda_south_propertiesReport_totalCount

    Number of MQTT property reporting successes

    iotda_south_propertiesReport_successCount

    Number of MQTT property reporting failures

    iotda_south_propertiesReport_failedCount

    Total number of MQTT message reporting times

    iotda_south_messageUp_totalCount

    Number of MQTT message reporting successes

    iotda_south_messageUp_successCount

    Number of MQTT message reporting failures

    iotda_south_messageUp_failedCount

    AMQP transfers

    iotda_amqp_forwarding_totalCount

    Number of AMQP transfer successes

    iotda_amqp_forwarding_successCount

    Number of AMQP transfer failures

    iotda_amqp_forwarding_failedCount

    FunctionGraph transfers

    iotda_functionGraph_forwarding_totalCount

    Number of FunctionGraph transfer successes

    iotda_functionGraph_forwarding_successCount

    Number of FunctionGraph transfer failures

    iotda_functionGraph_forwarding_failedCount

    MRS Kafka transfers

    iotda_mrsKafka_forwarding_totalCount

    Number of MRS Kafka transfer successes

    iotda_mrsKafka_forwarding_successCount

    Number of MRS Kafka transfer failures

    iotda_mrsKafka_forwarding_failedCount

    MQTT transfers

    iotda_mqtt_forwarding_totalCount

    Number of MQTT transfer successes

    iotda_mqtt_forwarding_successCount

    Number of MQTT transfer failures

    iotda_mqtt_forwarding_failedCount

    MySQL transfers

    iotda_mysql_forwarding_totalCount

    Number of MySQL transfer successes

    iotda_mysql_forwarding_successCount

    Number of MySQL transfer failures

    iotda_mysql_forwarding_failedCount

    InfluxDB transfers

    iotda_influxDB_forwarding_totalCount

    Number of InfluxDB transfer successes

    iotda_influxDB_forwarding_successCount

    Number of InfluxDB transfer failures

    iotda_influxDB_forwarding_failedCount

    HTTP message pushes

    iotda_http_forwarding_totalCount

    Number of HTTP message push transfer successes

    iotda_http_forwarding_successCount

    Number of HTTP message push transfer failures

    iotda_http_forwarding_failedCount

    OBS transfers

    iotda_obs_forwarding_totalCount

    Number of OBS transfer successes

    iotda_obs_forwarding_successCount

    Number of OBS transfer failures

    iotda_obs_forwarding_failedCount

    DMS Kafka transfers

    iotda_dmsKafka_forwarding_totalCount

    Number of DMS Kafka transfer successes

    iotda_dmsKafka_forwarding_successCount

    Number of DMS Kafka transfer failures

    iotda_dmsKafka_forwarding_failedCount

    DIS transfers

    iotda_dis_forwarding_totalCount

    Number of DIS transfer successes

    iotda_dis_forwarding_successCount

    Number of DIS transfer failures

    iotda_dis_forwarding_failedCount

    ROMA transfers

    iotda_roma_forwarding_totalCount

    Number of ROMA Connect transfer successes

    iotda_roma_forwarding_successCount

    Number of ROMA Connect transfer failures

    iotda_roma_forwarding_failedCount

    LTS transfers

    iotda_lts_forwarding_totalCount

    Number of LTS transfer successes

    iotda_lts_forwarding_successCount

    Number of LTS transfer failures

    iotda_lts_forwarding_failedCount

    Procedure

    1. Log in to the AOM console. In the navigation pane, choose Alarm Center > Alarm Action Rules. Click Create and configure parameters.
      Figure 1 Creating an alarm action rule
    1. In the navigation pane, choose Alarm Center > Alarm Rules. Click Create Alarm Rule in the upper right corner.
    2. Setting a threshold alarm rule
      1. Set basic information such as the rule name and description.
        Figure 2 Setting basic information
      2. Set details about the rule.
        1. Set Rule Type to Threshold alarm.
        2. Set Monitored Object to Command input and enter the corresponding command.
          Figure 3 Setting the object to be monitored

          Enter Prometheus commands. For details about Prometheus commands, move the cursor to next to the search box and click Learn more.

          For example, to query the number of DMS Kafka transfer failures in instance A, run the following command: sum(label_replace(sum_over_time(iotda_dmsKafka_forwarding_failedCount{instance="ID of instance A"}[59999ms]),"__name__","iotda_dmsKafka_forwarding_failedCount","",""))by(__name__,instance)

          iotda_dmsKafka_forwarding_failedCount indicates the metric name, which can be obtained from Table 2.

        3. Set Alarm Condition to Custom. In the Trigger Condition area, set trigger condition parameters, such as the statistical period, consecutive period, and threshold condition. For details about the parameters, see Table 3.
          Figure 4 Setting an alarm condition

          Taking the preceding figure as an example, a minor alarm will be generated when the total number is greater than 10 in three statistical periods.

          Table 3 Alarm condition parameters

          Category

          Parameter

          Description

          Trigger Condition

          Statistical Period

          Interval at which metric data is collected. By default, only one period is measured. A maximum of five periods can be measured.

          Consecutive Periods

          When the metric value meets the threshold condition for a specified number of consecutive periods, a threshold alarm will be generated.

          Statistic

          Method used to measure metrics. Options: Avg., Min., Max., Sum, and Samples.

          Threshold Condition

          Trigger condition of a threshold alarm. A threshold condition consists of two parts: operators (≥, ≤, >, and <) and threshold value. For example, if Threshold Condition is set to > 85 and an actual metric value exceeds 85, a threshold alarm will be generated.

          Alarm Severity

          Severity of a threshold alarm. Options: Critical, Major, Minor, and Warning.

          Advanced Configuration

          Alarm Clearance

          An alarm will be cleared if the monitored object does not meet the trigger condition within the monitoring period. By default, metrics in only one period are monitored. You can set up to five monitoring periods.

          Action Taken for Insufficient Data

          Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can configure this option based on your requirements.

          By default, metrics in only one period are monitored. You can set up to five monitoring periods.

          Options: Alarm, Insufficient data, Keep previous status, and Normal.

      3. Configure alarm notifications.
        1. Set Alarm Mode to Direct Alarm Reporting.
        2. Enable Action Rule and set it to the action rule created in Step 1.
        3. Enable Notification.
          Figure 5 Configuring alarm notifications

          For details about how to use alarm noise reduction, see Alarm Noise Reduction.

  • Viewing alarm information
    You can use AOM to view alarms generated in the last 15 days.
    1. Access the IoTDA service page and click Access Console.
    2. In the navigation pane, choose O&M > Device Alarms. Click Application Operations Management (AOM) to access the AOM console and view alarms generated for IoTDA.
    3. Click an alarm to view the alarm details.
      Figure 6 Viewing alarm details
    4. Clear an alarm. After the fault is rectified, click in the Operation column of the target alarm.

For details, see Viewing Alarms.