ALM-12085 Service Audit Log Dump Failure

Description

The system dumps service audit logs at 03:00 every day and stores them on the OMS node. This alarm is generated when the dump fails. This alarm is cleared when the next dump succeeds.

Attribute

Alarm ID	Alarm Severity	Auto Clear
12085	Minor	Yes

Parameters

Name	Meaning
Source	Specifies the cluster or system for which the alarm is generated.
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.

Impact on the System

The service audit logs may be lost.

Possible Causes

The service audit logs are oversized.
The OMS backup storage space is insufficient.
The storage space of a host where the service is located is insufficient.

Procedure

Check whether the service audit logs are oversized.

In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and view the IP address of the host for which the alarm is generated.
Log in to the host where the alarm is generated as user root.
Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "LOG SIZE is more than 5000MB" can be searched.
- If it can, go to 4.
- If it cannot, go to 5.
Check whether the oversized service audit logs are caused by exceptions.

The OMS backup storage space is insufficient.

Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, too many logs on" can be searched.
- If it can, obtain the host IP address following the keyword "Collect log failed, too many logs on", and go to 6.
- If it cannot, go to 10.
Log in to the host with the IP address obtained in as user root.
Run the vi {BIGDATA_LOG_HOME}/nodeagent/scriptlog/collectLog.log command to check whether the keyword "log size exceeds" can be searched.
- If it can, go to 8.
- If it cannot, go to 10.
Expand the capacity of the OMS node.
In the next execution period, 03:00, check whether the alarm is cleared.
- If it is, no further action is required.
- If it is not, go to 10.

Check whether the space of the host where the service is located is insufficient.

Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, no enough space on hostIp" can be searched.
- If it can, obtain the IP address of the abnormal host and go to 11.
- If it cannot, go to 14.
Log in to the host with the IP address obtained as user root, and run the df "$BIGDATA_HOME/tmp" -lP | tail -1 | awk '{print ($4/1024)}' command to obtain the remaining space of the host log directory. Check whether the value is less than 1000 MB.
- If it is, go to 12.
- If it is not, go to 14.
Expand the capacity of the node
In the next execution period, 03:00, check whether the alarm is cleared.
- If it is, no further action is required.
- If it is not, go to 14.

Collect fault information.

On FusionInsight Manager, choose O&M> Log > Download.
Select Controller for Service and click OK.
Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
Contact the O&M personnel and send the collected log information.