ALM-12110 Failed to get ECS temporary AK/SK

Alarm Description

The meta component calls the ECS API to obtain AK/SK information every 5 minutes and caches the information. Before the AK/SK expires, the component calls the API again to update it. This alarm is generated when the component fails to call the API for three consecutive times.

This alarm is cleared when Meta successfully calls the ECS API.

Alarm Attributes

Alarm ID	Alarm Severity	Auto Cleared
12110	Major	Yes

Alarm Parameters

Parameter	Description
Source	Specifies the cluster for which the alarm was generated.
ServiceName	Specifies the service for which the alarm was generated.
RoleName	Specifies the role for which the alarm was generated.
HostName	Specifies the host for which the alarm was generated.

Impact on the System

The cluster cannot obtain the latest temporary AK/SK. For a storage-compute decoupled system, OBS files may fail to be accessed. As a result, upper-layer component services cannot process data.

Possible Causes

The meta role of the MRS cluster is abnormal.
The cluster has been bound to an agency and accessed OBS, but later it was unbound from the agency.

Handling Procedure

Check the status of the meta role.

On FusionInsight Manager of the cluster, choose O&M > Alarm > Alarms. On the page that is displayed, click in the row containing the alarm, and determine the IP address of the host for which the alarm is generated.
Choose Cluster > Services > meta. On the page that is displayed, click the Instances tab, and check whether the meta role corresponding to the host for which the alarm is generated is normal.
- If yes, go to Step 5.
- If no, go to Step 3.
Select the abnormal role, click More, and select Restart Instance to restart the abnormal meta role.

Services may be affected or interrupted during the restart. You are advised to perform the restart during off-peak hours.
After the role is restarted, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 5.
Log in to the required host as user root using the IP address obtained in Step 1 and check whether the /var/log/Bigdata/meta/mrs-meta.log file contains error information. If yes, rectify the fault based on the log information.
```
cat /var/log/Bigdata/meta/mrs-meta.log
```
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 7.

Rebind an IAM agency to the cluster.

Log in to the MRS management console.
In the navigation pane on the left, choose Active Clusters. On the page that is displayed, click the cluster name to go to its overview page. Then, check whether the cluster is bound to an IAM agency in the O&M management area.
- If yes, go to Step 10.
- If no, go to Step 9.
Click Select Agency. On the page that is displayed, rebind an IAM agency that the permissions to access OBS cluster to the cluster. Then check whether the alarm is cleared a few minutes later.
- If yes, no further action is required.
- If no, go to Step 10.

Collect fault information.

On FusionInsight Manager of the active cluster, choose O&M. In the navigation pane on the left, choose Log > Download.
Expand the Service drop-down list, select meta for the target cluster, and click OK.
Click in the upper right corner, and select a time span starting 10 minutes before and ending 10 minutes after when the alarm was generated. Then, click Download to collect the logs.
Contact O&M personnel and provide the collected logs.