Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-12085 Service Audit Log Dump Failure
Updated on 2024-11-29 GMT+08:00

ALM-12085 Service Audit Log Dump Failure

Alarm Description

The system dumps service audit logs at 03:00 every day and stores them on the OMS node. This alarm is generated when the dump fails. This alarm is cleared when the next dump succeeds.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

12085

Minor

Quality of service

FusionInsight Manager

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Additional Information

Detail

Specifies the details for which the alarm is generated.

Impact on the System

If the audit logs of a component fail to be dumped, the audit logs cannot be retrieved if they are aged locally. This affects service analysis and troubleshooting of the component.

Possible Causes

  • The service audit logs are oversized.
  • The OMS backup storage space is insufficient.
  • The storage space of a host where the service is located is insufficient.

Handling Procedure

Check whether the service audit logs are oversized.

  1. In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and view the IP address of the host and additional information for which the alarm is generated.
  2. Log in to the host where the alarm is generated as user root.
  3. Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "LOG SIZE is more than 5000MB" can be searched.

    • If it can, go to 4.
    • If it cannot, go to 5.

  4. Check whether the oversized service audit logs are caused by exceptions.

The OMS backup storage space is insufficient.

  1. Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, too many logs on" can be searched.

    • If it can, obtain the host IP address following the keyword "Collect log failed, too many logs on", and go to 6.
    • If it cannot, go to 11.

  2. Log in to the host with the IP address obtained in 5 as user root.
  3. Run the vi {BIGDATA_LOG_HOME}/nodeagent/scriptlog/collectLog.log command to check whether the keyword "log size exceeds" can be searched.

    • If it can, go to 9.
    • If it cannot, go to 8.

  4. Check whether the alarm additional information contains the keyword "no enough space".

    • If yes, go to 9.
    • If no, go to11.

  5. Perform the following operations to expand the disk capacity or reduce the maximum number of audit log backups:

    • Expand the capacity of the OMS node.
    • Run the following command to edit the file and decrease the value of MAX_NUM_BK_AUDITLOG.

      vi ${CONTROLLER_HOME}/etc/om/componentsauditlog.properties

  6. In the next execution period, 03:00, check whether the alarm is cleared.

    • If it is, no further action is required.
    • If it is not, go to 11.

Check whether the space of the host where the service is located is insufficient.

  1. Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, no enough space on hostIp" can be searched.

    • If it can, obtain the IP address of the abnormal host and go to 12.
    • If it cannot, go to 15.

  2. Log in to the host with the IP address obtained as user root, and run the df "$BIGDATA_HOME/tmp" -lP | tail -1 | awk '{print ($4/1024)}' command to obtain the remaining space of the host log directory. Check whether the value is less than 1000 MB.

    • If it is, go to 13.
    • If it is not, go to 15.

  3. Expand the capacity of the node.
  4. In the next execution period, 03:00, check whether the alarm is cleared.

    • If it is, no further action is required.
    • If it is not, go to 15.

Collect fault information.

  1. On FusionInsight Manager, choose O&M> Log > Download.
  2. Select Controller for Service and click OK.
  3. Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
  4. Contact the O&M engineers and send the collected log information.

Alarm Clearance

This alarm will be automatically cleared after the fault is rectified.

Related Information

None.