What Should I Do If the Monitoring Is Periodically Interrupted or the Agent Status Keeps Changing?
Symptom
Monitoring interruptions and unstable Agent status may be caused by Agent overload. The Agent is overloaded if you see either of the following symptoms:
- On the Server Monitoring page of the Cloud Eye console, the Agent status frequently changes between Running and Faulty.
- The period in the metric dashboard is discontinuous.
Constraints
The restoration method in this section only supports new Agent version. If your Agent is of an earlier version, you are advised to upgrade it to the new version.
Run the following command to check the current Agent version:
if [[ -f /usr/local/uniagent/extension/install/telescope/bin/telescope ]]; then /usr/local/uniagent/extension/install/telescope/bin/telescope -v; elif [[ -f /usr/local/telescope/bin/telescope ]]; then echo "old agent"; else echo 0; fi
- If old agent is displayed, the Agent version is old.
- If a version ID is returned, the Agent version is new.
- If 0 is returned, the Agent has not been installed.
Possible Causes
The circuit patter is implemented by the Agent when the CPU and memory usage is too high to prevent other services from being affected. The circuit breaker pattern will be implemented automatically when the Agent is overloaded, and no monitoring data will not be reported.
Circuit Breaker Principles
By default, the Agent detection mechanism is as follows:
The Agent resource usage will be checked every one minute. If the resource usage exceeds the tier-2 thresholds (30% of CPU usage and 700 MB memory usage), the Agent exists. If the tier-1 thresholds (10% CPU usage and 200 MB memory usage) for three consecutive times, the Agent also exists and a record will be generated.
After the Agent exits, the daemon process automatically starts the Agent process and checks the exit records. If there are three consecutive exit records, the Agent will hibernate for 20 minutes, during which monitoring data will not be collected.
When too many disks are attached to a server, the CPU or memory usage of the Agent process will become high. You can configure the tier-1 and tier-2 thresholds based on Procedure to trigger the circuit-breaker pattern according to the actual resource usages.
Procedure
- Use the root account to log in to the ECS or BMS for which the Agent does not report data.
- Optional: Go to the Agent installation path:
For Windows, the path is C:\Program Files\uniagent\extension\install\telescope.
For Linux, the path is /usr/local/uniagent/extension/install/telescope/bin.
- Modify configuration file conf.json.
- Run the following command to open conf.json:
vi conf.json
- Add the following parameters to the conf.json file. For details about the parameters, see Table 1.
Table 1 Parameters Parameter
Description
cpu_first_pct_threshold
Tier-1 threshold of CPU usage. The default value is 10 (%).
memory_first_threshold
Tier-1 threshold of memory usage. The default value is 209715200 (200 MB). The unit is byte.
cpu_second_pct_threshold
Tier-2 threshold of CPU usage. The default value is 30 (%).
memory_second_threshold
Tier-2 threshold of memory usage. The default value is 734003200 (700 MB). The unit is byte.
a To query the CPU usage and memory usage of the Agent, use the following method:
{ "cpu_first_pct_threshold": xx, "memory_first_threshold": xxx, "cpu_second_pct_threshold": xx, "memory_second_threshold": xxx }
- Run the following command to save and exit the conf.json file:
:wq
- Run the following command to open conf.json:
- Restart the Agent:
- Windows:
- In the directory where the Agent installation package is stored, double-click the shutdown.bat script to stop the Agent, and then execute the start.bat script to start the Agent.
- Linux:
- Run the following command to check the PID of telescope:
- ps -ef |grep telescope
- After the process is forcibly stopped, wait for 3 to 5 minutes for the Agent to automatically restart. Figure 1 shows an operation example.
- kill -9 PID
- Windows:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot