Help Center/ Cloud Eye/ FAQs/ Server Monitoring/ What Should I Do If the Monitoring Period Is Interrupted or the Agent Status Keeps Changes?

Updated on 2023-01-30 GMT+08:00

View PDF

What Should I Do If the Monitoring Period Is Interrupted or the Agent Status Keeps Changes?

Symptoms

The Agent is overloaded if you see either of the following symptoms:

On the Server Monitoring page of the Cloud Eye console, the Agent status frequently toggles between Running and Faulty.
The time period in the monitoring panel is discontinuous.

Possible Causes

To prevent other services from being affected, Cloud Eye uses a circuit-breaker to automatically stop the Agent process if it is consuming too many CPU or memory resources on the server. After the Agent process is stopped, no monitoring data is reported.

Circuit-breaker Principles

By default, once per minute, the system checks whether the CPU usage of the Agent process is exceeding 30% or if the memory usage is exceeding 700 MB (the tier-2 threshold) every minute. If the tier-2 threshold is exceeded, the Agent process exits. If the tier-2 threshold is not exceeded, Cloud Eye checks whether the CPU usage is exceeding 10% or if the memory usage is exceeding 200 MB (the tier-1 threshold). If the tier-1 threshold is exceeded for three consecutive times, the Agent process exits, and the exit is logged.

After the Agent exits, the daemon process automatically starts the Agent process and checks the exit record. If there are three consecutive exit records, the Agent will hibernate for 20 minutes, during which monitoring data will not be collected.

When too many disks are attached to a server, the CPU or memory usage of the Agent process will become high. You can configure the tier-1 and tier-2 thresholds based on Procedure to trigger circuit-breaker according to the actual resource usages.

Procedure

Use the root account to log in to the ECS or BMS for which the Agent does not report data.
Go to the Agent installation path bin:
cd /usr/local/telescope/bin

For the Agent of the new version, run the cd /usr/local/uniagent/extension/install/telescope/bin command.

In a Windows OS, the directory is telescope_windows_amd64\bin.

Modify configuration file conf.json.

Open conf.json:
vi conf.json

Add the parameters listed in Table 1 to the conf.json file.

**Table 1** Parameters
Parameter	Description
cpu_first_pct_threshold	Specifies the tier-1 threshold for the CPU usage. If the CPU usage of the Agent process is about 20%, set this parameter to 35. Unit: percent (%)
memory_first_threshold	Specifies the tier-1 threshold for the memory usage. If the Agent used up about 100 MB of memory, set this parameter to 314572800 (300 MB). Unit: bytes
cpu_second_pct_threshold	Specifies the tier-2 threshold for the CPU usage. If the CPU usage of the Agent process is about 20%, set this parameter to 55. Unit: percent (%)
memory_second_threshold	Specifies the tier-2 threshold for the memory usage. If the Agent process used up about 100 MB memory, set this parameter to 734003200 (700 MB). Unit: bytes
To query the CPU usage and memory usage of the Agent process, use the following method: Linux top -p telescope PID Windows View the details about the Agent process in Task Manager.

Save the conf.json file and exit:
:wq

Run the following command to restart the Agent if the early version of the Agent is used:
/usr/local/telescope/telescoped restart

For Windows, in the directory where the Agent installation package is stored, double-click the shutdown.bat script to stop the Agent, and execute the start.bat script to start the Agent.

If the new version of the Agent is used, run the following command to check the PID of telescope:

ps -ef |grep telescope

After the process is forcibly stopped, wait for 3 to 5 minutes for the Agent to automatically restart. Figure 1 shows an operation example.

kill -9 PID

Figure 1 Restarting the Agent

Parent topic: Server Monitoring

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel