Updated on 2024-11-29 GMT+08:00

ALM-41007 RTDService Unavailable

Alarm Description

The system checks the RTDService service status every 60 seconds. This alarm is generated when all RTDService services are abnormal and the RTDService service is unavailable.

This alarm is cleared when the RTDService service becomes normal.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

41007

Critical

Quality of service

RTDService

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

Host Name

Specifies the name of the host for which the alarm is generated.

Impact on the System

RTDService cannot provide services for external systems. The RTD console cannot be accessed, and functions such as modifying tenants and event sources are unavailable.

Possible Causes

  • The disk or memory usage exceeds 90%.
  • The RTDService process is faulty.

Handling Procedure

Check the disk and memory usage.

  1. On FusionInsight Manager, choose O&M > Alarm > RTDService Service Unavailable to view and record the host name reported in Location Info.
  2. Click Host, view the node corresponding to the host for which the alarm is generated, and log in to the faulty node as the root user.
  3. Run the df -h command to check whether the disk space usage exceeds 90%.

    • If yes, clear the space and go to 4.
    • If no, go to 5.

  4. Wait for 10 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 5.

  5. Run the free -m command to check whether the memory usage exceeds 90%.

    The memory usage is calculated as follows: Actual memory usage (values in the -/+ buffers/cache row and used column) divided by total.

    [root@xxx FusionInsight_RTD_xxx]# free -m
                  total        used        free      shared  buff/cache   available
    Mem:          64263        7140       22633        5485       34490       46393
    Swap:             0           0           0
    • If yes, expand the memory capacity and go to 6.
    • If no, go to 7.

  6. Wait for 10 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 7.

Check the RTDService process.

  1. Log in to the node corresponding to the host for which the alarm is generated as the root user.
  2. Perform to check whether the RTDService process exists.

    ps -aux | grep tomcat | grep RTDServer

    • If yes, record the PID and go to 10.
    • If no, log in to FusionInsight Manager and choose Cluster > Services > RTDService. On the page that is displayed, choose More > Restart Service to restart the RTDService service. Then, go to 9.

  3. Wait for 10 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, run the 8 command again to query the RTDService process. If the process still does not exist, go to 12.

  4. Run the following command to check whether the process status is D:

    cat /proc/pid/status |grep -i state

    • If yes, run the reboot command to restart the host. Then, go to 11.
    • If no, go to 12.

  5. Wait for 10 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 12.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Select RTDService for Service and click OK.
  3. In the Hosts area, select the host where the role is located.
  4. Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
  5. Contact O&M personnel/Technical support and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None.