What Should I Do If the Monitoring Period Is Interrupted or the Agent Status Frequently Changes?
Symptom
The Agent is overloaded if you see either of the following symptoms:
- On the Server Monitoring page of the Cloud Eye console, the Agent status frequently toggles between Running and Faulty.
- The time period in the monitoring panel is discontinuous.
Possible Causes
To prevent other services from being affected, Cloud Eye uses circuit-breaker to automatically stop the Agent process once it consumes too many CPU or memory resources on the server. After the Agent process is stopped, no monitoring data will be reported.
Circuit-breaker Principles
By default, the system checks whether the CPU usage of the Agent process exceeds 30% or the memory usage exceeds 700 MB (the tier-2 threshold) every minute. If the tier-2 threshold is exceeded, the Agent process exits. If the tier-2 threshold is not exceeded, check whether the CPU usage exceeds 10% or the memory usage exceeds 200 MB (the tier-1 threshold). If the tier-1 threshold is exceeded for three consecutive times, the Agent process exits and the exit is logged.
After the Agent exits, the daemon process automatically starts the Agent process and checks the exit record. If there are three consecutive exit records, the Agent will hibernate for 20 minutes, during which monitoring data will not be collected.
When too many disks are attached to a server, the CPU or memory usage of the Agent process will become high. You can configure the tier-1 and tier-2 thresholds based on Procedure to trigger circuit-breaker according to the actual resource usages.
Procedure
- Use the root account to log in to the ECS or BMS for which the Agent does not report data.
- Go to the Agent installation path bin:
cd /usr/local/telescope/bin
In a Windows OS, the directory is telescope_windows_amd64\bin.
- Modify configuration file conf.json.
- Open conf.json:
vi conf.json
- Add the parameters listed in Table 1 to the conf.json file.
Table 1 Parameter description Parameter
Description
cpu_first_pct_threshold
Specifies the tier-1 threshold for the CPU usage. If the CPU usage of the Agent process is about 20%, you are advised to set this parameter to 35. The unit is percent (%).
memory_first_threshold
Specifies the tier-1 threshold for the memory usage. If the Agent used up about 100 MB of memory, you are advised to set this parameter to 314572800 (300 MB). The unit is byte by default.
cpu_second_pct_threshold
Specifies the tier-2 threshold for the CPU usage. If the CPU usage of the Agent process is about 20%, you are advised to set this parameter to 55. The unit is percent (%).
memory_second_threshold
Specifies the tier-2 threshold for the memory usage. If the Agent process used up about 100 MB memory, you are advised to set this parameter to 734003200 (700 MB). The unit is byte by default.
To query the CPU usage and memory usage of the Agent process, use the following method:
{ "InstanceId":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "ProjectId": "b5b92ee0xxxxxxxxxxxxxxxxcab92396", "AccessKey": "QZ0XGJXFxxxxxxxxT65R", "SecretKey": "lEv2aXAGwxxxxxxxxxxxxxxxxxxxxF8t0Bf18Tn2", "RegionId": "cn-north-1", "ClientPort": 0, "PortNum": 200, "cpu_first_pct_threshold": 35, "memory_first_threshold": 314572800, "cpu_second_pct_threshold": 70, "memory_second_threshold": 734003200 } - Save the conf.json file and exit:
:wq
- Open conf.json:
- Restart the Agent:
/usr/local/telescope/telescoped restart
For a Windows OS, in the directory where the Agent installation package is stored, double-click the shutdown.bat script to stop the Agent, and execute the start.bat script to start the Agent.
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.