Updated on 2024-11-06 GMT+08:00

Alarms

The IoT platform generates an alarm when it detects that the alarm triggering condition set in a rule is met or the device message reporting rate exceeds the threshold preset on the platform. Pay close attention to the alarms and handle them in a timely manner to ensure the normal device running.

Alarms are classified into rule alarms, system alarms, and custom metric alarms.
  • Rule alarms: If you set the action Report alarms when configuring a device linkage rule and define the alarm properties and severity, the platform reports an alarm when the trigger condition is met. For example, if a smart water meter does not report data for three consecutive days, the platform generates an alarm to notify maintenance personnel of the water meter fault. Maintenance personnel then locate the faulty water meter based on the alarm information and repair it promptly.
  • System alarms: When some resources of a user, for example, the number of devices, reach the upper limit of the user quota, the IoTDA platform reports a system alarm to the AOM. This type of alarm is automatically triggered by the IoTDA platform, but notification rules need to be configured. Table 1 lists the system alarms.
    Table 1 System alarms

    Alarm

    Description

    MQTT Message Flow Control for a Single Device

    When the volume of data sent by an MQTT device per second exceeds the threshold (3 KB/s by default), the platform starts flow control on the MQTT device and generates this alarm.

    Device Upstream Messages Exceeding the Tenant Flow Control

    The sum of the upstream message rate and connection setup rate exceeds the threshold. (PUBLISH indicates upstream message, CONNECT indicates connection setup, and BANDWIDTH indicates bandwidth.) By default, the rate of upstream messages is 500 messages per second in the basic edition, and the rate of link setup is 100 messages per second in the basic edition. For details about the standard and enterprise editions, see Specifications. If the rate exceeds the default value, flow control will be performed and an alarm will be generated.

    Number of User Devices Reaching the Threshold

    This alarm is generated when the number of registered user devices reaches 80% or 100% of the instance threshold (50,000 for the basic edition, and 20 times of the number of online devices for the standard or enterprise edition. For details, see Specifications).

    Number of Online User Devices Reaching the Threshold

    This alarm is generated when the number of online user devices reaches 80% or 100% of the threshold. (The threshold depends on the number of purchased units. For the standard or enterprise edition, see Specifications.) When the number of online user devices exceeds the threshold, device access is rejected. The alarm is triggered once an hour.

    Number of Child Devices Under a Gateway Reaching the Threshold

    This alarm is generated when the number of child devices under a gateway reaches 80% or 100% of the threshold.

    Linkage Rule Triggering Concurrency Threshold

    This alarm is generated when the number of linkage rules triggered per second exceeds the threshold (10/s for the basic or standard edition and 100/s for the enterprise edition), and flow control is triggered on the excess part. This alarm is triggered only once a day.

    Number of API Calls from a Tenant Reaching the Flow Control Threshold

    This alarm is generated when the TPS of API calls made by a tenant exceeds the threshold. (Unless otherwise specified, the default limit of an API is 50/s. Maximum number of API calls made by an account per second: 100/s for the basic and standard editions.) Flow control is triggered on the excess part. This alarm is triggered only once a day.

    Dafa Forwarding Target Added to the Blacklist

    This alarm is generated when the number of data forwarding failures reaches a specified value (10 by default) and the current forwarding target is added to the blacklist.

  • Custom metric alarms: You can log in to the AOM 1.0 or AOM 2.0 console to configure custom metric alarms. For details, see Configuration Procedure for AOM 1.0. Currently, the following metrics are supported.
    Table 2 Custom alarm metrics

    Metric

    Name

    Total number of devices

    iotda_device_status_totalCount

    Number of online devices

    iotda_device_status_onlineCount

    Number of offline devices

    iotda_device_status_offlineCount

    Number of abnormal devices

    iotda_device_status_abnormalCount

    Number of inactive devices

    iotda_device_status_inactiveCount

    Number of activated devices

    iotda_device_status_activeCount

    Number of online devices (accumulated)

    iotda_device_status_dailyOnlineCount

    Total number of reported NB-IoT data records

    iotda_south_dataReport_totalCount

    Number of NB-IoT data reporting failures

    iotda_south_dataReport_failedCount

    Total number of MQTT event reporting times

    iotda_south_eventUp_totalCount

    Number of MQTT event reporting successes

    iotda_south_eventUp_successCount

    Number of MQTT event reporting failures

    iotda_south_eventUp_failedCount

    Total number of MQTT property reporting times

    iotda_south_propertiesReport_totalCount

    Number of MQTT property reporting successes

    iotda_south_propertiesReport_successCount

    Number of MQTT property reporting failures

    iotda_south_propertiesReport_failedCount

    Total number of MQTT message reporting times

    iotda_south_messageUp_totalCount

    Number of MQTT message reporting successes

    iotda_south_messageUp_successCount

    Number of MQTT message reporting failures

    iotda_south_messageUp_failedCount

    AMQP transfers

    iotda_amqp_forwarding_totalCount

    Number of AMQP transfer successes

    iotda_amqp_forwarding_successCount

    Number of AMQP transfer failures

    iotda_amqp_forwarding_failedCount

    FunctionGraph transfers

    iotda_functionGraph_forwarding_totalCount

    Number of FunctionGraph transfer successes

    iotda_functionGraph_forwarding_successCount

    Number of FunctionGraph transfer failures

    iotda_functionGraph_forwarding_failedCount

    MRS Kafka transfers

    iotda_mrsKafka_forwarding_totalCount

    Number of MRS Kafka transfer successes

    iotda_mrsKafka_forwarding_successCount

    Number of MRS Kafka transfer failures

    iotda_mrsKafka_forwarding_failedCount

    MQTT transfers

    iotda_mqtt_forwarding_totalCount

    Number of MQTT transfer successes

    iotda_mqtt_forwarding_successCount

    Number of MQTT transfer failures

    iotda_mqtt_forwarding_failedCount

    MySQL transfers

    iotda_mysql_forwarding_totalCount

    Number of MySQL transfer successes

    iotda_mysql_forwarding_successCount

    Number of MySQL transfer failures

    iotda_mysql_forwarding_failedCount

    InfluxDB transfers

    iotda_influxDB_forwarding_totalCount

    Number of InfluxDB transfer successes

    iotda_influxDB_forwarding_successCount

    Number of InfluxDB transfer failures

    iotda_influxDB_forwarding_failedCount

    HTTP message pushes

    iotda_http_forwarding_totalCount

    Number of HTTP message push transfer successes

    iotda_http_forwarding_successCount

    Number of HTTP message push transfer failures

    iotda_http_forwarding_failedCount

    OBS transfers

    iotda_obs_forwarding_totalCount

    Number of OBS transfer successes

    iotda_obs_forwarding_successCount

    Number of OBS transfer failures

    iotda_obs_forwarding_failedCount

    DMS Kafka transfers

    iotda_dmsKafka_forwarding_totalCount

    Number of DMS Kafka transfer successes

    iotda_dmsKafka_forwarding_successCount

    Number of DMS Kafka transfer failures

    iotda_dmsKafka_forwarding_failedCount

    DIS transfers

    iotda_dis_forwarding_totalCount

    Number of DIS transfer successes

    iotda_dis_forwarding_successCount

    Number of DIS transfer failures

    iotda_dis_forwarding_failedCount

    ROMA transfers

    iotda_roma_forwarding_totalCount

    Number of ROMA Connect transfer successes

    iotda_roma_forwarding_successCount

    Number of ROMA Connect transfer failures

    iotda_roma_forwarding_failedCount

    LTS transfers

    iotda_lts_forwarding_totalCount

    Number of LTS transfer successes

    iotda_lts_forwarding_successCount

    Number of LTS transfer failures

    iotda_lts_forwarding_failedCount

Configuration Procedure for AOM 1.0

  1. Log in to the AOM console. In the navigation pane, choose Alarm Center > Alarm Action Rules. Click Create and configure parameters.

    Figure 1 Creating an alarm action rule

  2. In the navigation pane, choose Alarm Center > Alarm Rules. Click Create Alarm Rule in the upper right corner.
  3. Setting a threshold alarm rule

    1. Set basic information such as the rule name and description.
      Figure 2 Setting basic alarm information
    2. Set details about the rule.
      1. Set Rule Type to Threshold alarm.
      2. Set Monitored Object to Command input and enter the corresponding command.
        Figure 3 Setting objects to be monitored

        Enter Prometheus commands. For details about Prometheus commands, move the cursor to next to the search box and click Learn more.

        For example, to query the number of DMS Kafka transfer failures in instance A, run the following command: sum(label_replace(sum_over_time(iotda_dmsKafka_forwarding_failedCount{instance="ID of instance A"}[59999ms]),"__name__","iotda_dmsKafka_forwarding_failedCount","",""))by(__name__,instance)

        iotda_dmsKafka_forwarding_failedCount indicates the metric name, which can be obtained from Table 2.

      3. Set Alarm Condition to Custom. In the Trigger Condition area, set trigger condition parameters, such as the statistical period, consecutive period, and threshold condition. For details about the parameters, see Table 3.
        Figure 4 Setting alarm conditions

        Taking the preceding figure as an example, a minor alarm will be generated when the total number is greater than 10 in three statistical periods.

        Table 3 Alarm condition parameters

        Category

        Parameter

        Description

        Trigger Condition

        Statistical Period

        Interval at which metric data is collected. By default, only one period is measured. A maximum of five periods can be measured.

        Consecutive Periods

        When the metric value meets the threshold condition for a specified number of consecutive periods, a threshold alarm will be generated.

        Statistic

        Method used to measure metrics. Options: Avg., Min., Max., Sum, and Samples.

        Threshold Condition

        Trigger condition of a threshold alarm. A threshold condition consists of two parts: operators (≥, ≤, >, and <) and threshold value. For example, if Threshold Condition is set to > 85 and an actual metric value exceeds 85, a threshold alarm will be generated.

        Alarm Severity

        Severity of a threshold alarm. Options: Critical, Major, Minor, and Warning.

        Advanced Configuration

        Alarm Clearance

        An alarm will be cleared if the monitored object does not meet the trigger condition within the monitoring period. By default, metrics in only one period are monitored. You can set up to five monitoring periods.

        Action Taken for Insufficient Data

        Action to be taken when no metric data is generated or metric data is insufficient within the monitoring period. You can configure this option based on your requirements.

        By default, metrics in only one period are monitored. You can set up to five monitoring periods.

        Options: Alarm, Insufficient data, Keep previous status, and Normal.

    3. Configure alarm notifications.
      1. Set Alarm Mode to Direct Alarm Reporting.
      2. Select the action rule created in 1.
      3. Enable Notification.
        Figure 5 Setting alarm notifications

        For details about how to use alarm noise reduction, see Alarm Noise Reduction.

Configuration Procedure for AOM 2.0

  1. Log in to the AOM console. In the navigation pane, choose Alarm Management > Alarm Action Rules. On the displayed page, click Create and configure parameters.

    Figure 6 Creating an alarm action rule

  2. In the navigation pane, choose Alarm Management > Alarm Rules. On the displayed page, click Create.
  3. Enter a rule name, select an enterprise project from the drop-down list, and enter the rule description as required.

    Figure 7 Creating an alarm rule

  4. Set details about the rule.

    1. Rule Type: Select Metric alarm rule.
    2. Configuration Mode: Select Select from all metrics.
    3. Prometheus Instance: Select the target instance.
    4. Alarm Rule Details: Select Multiple Metrics.
    5. Metric: Enter iotda in the Metric text box to get related metrics. For details about the metric, see Table 2.
    6. Conditions: Specify the dimension name, filter criteria, and dimension value.
    7. Rule: Enter the metric alarm threshold.
    8. Trigger Condition: Enter the consecutive periods for triggering the alarm.
    9. Alarm Severity: Select an alarm severity icon.
    Figure 8 Setting alarm rules

  5. Set alarm notification. Enable the alarm action rule and select a rule from the drop-down list. If no action rule is available, click the check icon on the right to go to the page for creating an alarm action rule.

    Figure 9 Setting alarm rules

Checking Alarm Information

You can use AOM to view alarms generated in the last 15 days. For details, see Viewing Alarms.
  1. Access the IoTDA service page and click Access Console. Click the target instance card.
  2. In the navigation pane, choose O&M > Device Alarms. Click Application Operations Management (AOM) to access the AOM console and view alarms generated for IoTDA.
  3. Click an alarm to check the alarm details.
    Figure 10 Viewing alarm details
  4. Clear an alarm. After the fault is rectified, click in the Operation column of the target alarm.