ALM-12085 Service Audit Log Dump Failure

Alarm Description

The system dumps service audit logs at 03:00 every day and stores them on the OMS node. This alarm is generated when the dump fails. This alarm is cleared when the next dump succeeds.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
12085	Minor	Quality of service	FusionInsight Manager	Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster or system for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
Additional Information	Detail	Specifies the details for which the alarm is generated.

Impact on the System

If the audit logs of a component fail to be dumped, the audit logs cannot be retrieved if they are aged locally. This affects service analysis and troubleshooting of the component.

Possible Causes

The service audit logs are oversized.
The OMS backup storage space is insufficient.
The storage space of a host where the service is located is insufficient.

Handling Procedure

Check whether the service audit logs are oversized.

In the alarm list on FusionInsight Manager, locate the row that contains the alarm, and view the IP address of the host and additional information for which the alarm is generated.
Log in to the host where the alarm is generated as user root.
Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "LOG SIZE is more than 5000MB" can be searched.
- If it can, go to 4.
- If it cannot, go to 5.
Check whether the oversized service audit logs are caused by exceptions.

The OMS backup storage space is insufficient.

Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, too many logs on" can be searched.
- If it can, obtain the host IP address following the keyword "Collect log failed, too many logs on", and go to 6.
- If it cannot, go to 11.
Log in to the host with the IP address obtained in 5 as user root.
Run the vi {BIGDATA_LOG_HOME}/nodeagent/scriptlog/collectLog.log command to check whether the keyword "log size exceeds" can be searched.
- If it can, go to 9.
- If it cannot, go to 8.
Check whether the alarm additional information contains the keyword "no enough space".
- If yes, go to 9.
- If no, go to11.
Perform the following operations to expand the disk capacity or reduce the maximum number of audit log backups:
- Expand the capacity of the OMS node.
- Run the following command to edit the file and decrease the value of MAX_NUM_BK_AUDITLOG.
  vi ${CONTROLLER_HOME}/etc/om/componentsauditlog.properties
In the next execution period, 03:00, check whether the alarm is cleared.
- If it is, no further action is required.
- If it is not, go to 11.

Check whether the space of the host where the service is located is insufficient.

Run the vi ${BIGDATA_LOG_HOME}/controller/scriptlog/getLogs.log command to check whether the keyword "Collect log failed, no enough space on hostIp" can be searched.
- If it can, obtain the IP address of the abnormal host and go to 12.
- If it cannot, go to 15.
Log in to the host with the IP address obtained as user root, and run the df "$BIGDATA_HOME/tmp" -lP | tail -1 | awk '{print ($4/1024)}' command to obtain the remaining space of the host log directory. Check whether the value is less than 1000 MB.
- If it is, go to 13.
- If it is not, go to 15.
Expand the capacity of the node.
In the next execution period, 03:00, check whether the alarm is cleared.
- If it is, no further action is required.
- If it is not, go to 15.

Collect fault information.

On FusionInsight Manager, choose O&M> Log > Download.
Select Controller for Service and click OK.
Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
Contact the O&M engineers and send the collected log information.