ALM-41007 RTDService Unavailable
Alarm Description
The system checks the RTDService service status every 60 seconds. This alarm is generated when all RTDService services are abnormal and the RTDService service is unavailable.
This alarm is cleared when the RTDService service becomes normal.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
41007 |
Critical |
Quality of service |
RTDService |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster or system for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
|
RoleName |
Specifies the role for which the alarm is generated. |
|
Host Name |
Specifies the name of the host for which the alarm is generated. |
Impact on the System
RTDService cannot provide services for external systems. The RTD console cannot be accessed, and functions such as modifying tenants and event sources are unavailable.
Possible Causes
- The disk or memory usage exceeds 90%.
- The RTDService process is faulty.
Handling Procedure
Check the disk and memory usage.
- On FusionInsight Manager, choose O&M > Alarm > RTDService Service Unavailable to view and record the host name reported in Location Info.
- Click Host, view the node corresponding to the host for which the alarm is generated, and log in to the faulty node as the root user.
- Run the df -h command to check whether the disk space usage exceeds 90%.
- Wait for 10 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 5.
- Run the free -m command to check whether the memory usage exceeds 90%.
The memory usage is calculated as follows: Actual memory usage (values in the -/+ buffers/cache row and used column) divided by total.
[root@xxx FusionInsight_RTD_xxx]# free -m total used free shared buff/cache available Mem: 64263 7140 22633 5485 34490 46393 Swap: 0 0 0
- Wait for 10 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 7.
Check the RTDService process.
- Log in to the node corresponding to the host for which the alarm is generated as the root user.
- Perform to check whether the RTDService process exists.
ps -aux | grep tomcat | grep RTDServer
- Wait for 10 minutes and check whether the alarm is cleared.
- Run the following command to check whether the process status is D:
cat /proc/pid/status |grep -i state
- Wait for 10 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 12.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Select RTDService for Service and click OK.
- In the Hosts area, select the host where the role is located.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel/Technical support and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot