Help Center/ Application Operations Management/ User Guide (Kuala Lumpur Region)/ FAQs/ What Can I Do If Resources Are Not Running Properly?
Updated on 2022-08-11 GMT+08:00

What Can I Do If Resources Are Not Running Properly?

Resource statuses include Normal, Warning, Abnormal, Deleted, and Silent. Warning, Abnormal, and Silent indicate improper resource running. Analyze and rectify faults according to the following instructions.

Warning

If a minor alarm or warning exists, the resource status is Warning.

Suggestion: Handle the alarm based on alarm details.

Abnormal

If a critical or major alarm exists, the resource status is Abnormal.

Suggestion: Handle the alarm based on alarm details.

Silent

If the ICAgent fails to collect resource metrics, the resource status is Silent. The causes include but are not limited to:

  • Cause 1: The ICAgent is abnormal.

    Suggestion: In the navigation pane, choose Configuration Management > Agent Management. On the page that is displayed, check the ICAgent status. If the status is not Running, the ICAgent has been uninstalled or is abnormal. For details on how to solve the problem, see Table 1.

    Table 1 ICAgent troubleshooting suggestions

    Status

    Suggestion

    Uninstalled

    Install the ICAgent according to Installing the ICAgent.

    Installing

    Wait for about 1 minute to complete the ICAgent installation.

    Installation failed

    Uninstall the ICAgent according to Uninstalling the ICAgent Through Logging In to the Server. Install the ICAgent again.

    Upgrading

    Wait for about 1 minute to complete the ICAgent upgrade.

    Upgrade failed

    Log in to the server to uninstall the ICAgent. Install the ICAgent again.

    Offline

    Ensure that the Access Key ID/Secret Access Key (AK/SK) or Elastic Cloud Server (ECS) agency configuration is correct.

    Faulty

    Contact technical support.

  • Cause 2: AOM cannot monitor the current resource.

    Suggestion: Check whether your resources can be monitored by AOM. AOM can monitor hosts, Kubernetes containers, and user processes, but does not monitor system processes.

  • Cause 3: The local time of the host is not synchronized with the NTP server time.

    NTP Sync Status: indicates whether the local time of the host is synchronized with the NTP server time. The value can be 0 or 1. 0 indicates the synchronized status while 1 indicates the asynchronized status.

    Suggestion: Choose Monitoring > Metric Monitoring and check the NTP Sync Status metric of the host. If the value of NTP Sync Status is 1, the local time of the host is not synchronized with that of the NTP server. To solve the problem, perform synchronization.

  • Cause 4: The resource is deleted or stopped.

    Suggestions:

    • On the ECS page, check whether the host is restarted, stopped, or deleted.
    • On the Cloud Container Engine (CCE) page, check whether the component is stopped or deleted.
    • If a discovery rule is stopped or deleted, the component discovered based on the rule will also be stopped or deleted. On the AOM page, check whether the discovery rule is stopped or deleted.