Updated on 2023-05-30 GMT+08:00

ALM-12085 Service Audit Log Dump Failure

Description

The system dumps service audit logs at 03:00 every day and stores them on the OMS node. This alarm is generated when the dump fails. This alarm is cleared when the next dump succeeds.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12085

Minor

Yes

Parameters

Name

Meaning

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

The service audit logs may be lost.

Possible Causes

  • The service audit logs are oversized.
  • The OMS backup storage space is insufficient.
  • The storage space of a host where the service is located is insufficient.

Procedure

Check whether the service audit logs are oversized.

  1. In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and view the IP address of the host and additional information for which the alarm is generated.
  2. Log in to the host where the alarm is generated as user root.
  3. Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "LOG SIZE is more than 5000MB" can be searched.

    • If it can, go to 4.
    • If it cannot, go to 5.

  4. Check whether the oversized service audit logs are caused by exceptions.

The OMS backup storage space is insufficient.

  1. Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, too many logs on" can be searched.

    • If it can, obtain the host IP address following the keyword "Collect log failed, too many logs on", and go to 6.
    • If it cannot, go to 11.

  2. Log in to the host with the IP address obtained in 5 as user root.
  3. Run the vi {BIGDATA_LOG_HOME}/nodeagent/scriptlog/collectLog.log command to check whether the keyword "log size exceeds" can be searched.

    • If it can, go to 9.
    • If it cannot, go to 8.

  4. Check whether the alarm additional information contains the keyword "no enough space".

    • If yes, go to 9.
    • If no, go to11.

  5. Perform the following operations to expand the disk capacity (only for MRS 3.1.2 and earlier versions) or reduce the maximum number of audit log backups:

    • Expand the capacity of the OMS node.
    • Run the following command to edit the file and decrease the value of MAX_NUM_BK_AUDITLOG.

      vi ${CONTROLLER_HOME}/etc/om/componentsauditlog.properties

  6. In the next execution period, 03:00, check whether the alarm is cleared.

    • If it is, no further action is required.
    • If it is not, go to 11.

Check whether the space of the host where the service is located is insufficient.

  1. Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, no enough space on hostIp" can be searched.

    • If it can, obtain the IP address of the abnormal host and go to 12.
    • If it cannot, go to 15.

  2. Log in to the host with the IP address obtained as user root, and run the df "$BIGDATA_HOME/tmp" -lP | tail -1 | awk '{print ($4/1024)}' command to obtain the remaining space of the host log directory. Check whether the value is less than 1000 MB.

    • If it is, go to 13.
    • If it is not, go to 15.

  3. Expand the capacity of the node
  4. In the next execution period, 03:00, check whether the alarm is cleared.

    • If it is, no further action is required.
    • If it is not, go to 15.

Collect fault information.

  1. On FusionInsight Manager, choose O&M> Log > Download.
  2. Select Controller for Service and click OK.
  3. Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
  4. Contact the O&M personnel and send the collected log information.

Alarm Clearing

This alarm will be automatically cleared after the fault is rectified.

Related Information

None