Updated on 2022-12-08 GMT+08:00

Failed to Scale Out an MRS Cluster

Issue

The MRS console is accessible and functions properly, but the MRS cluster fails to be scaled out.

Symptom

The MRS console is normal, and no alarm or error message is displayed on MRS Manager. However, an error message is displayed during cluster scale-out, indicating that the MRS cluster contains nodes that are not running.

Cause Analysis

An MRS cluster can be scaled in or out only when it is running properly. According to the error message, the possible cause is that the cluster status in the database is abnormal or is not updated. As a result, the nodes in the cluster are not in the running state.

Procedure

  1. Log in to the MRS console and click the cluster name to go to the cluster details page. Check that the cluster is in the Running state.
  2. Click Nodes to view the status of all nodes. Ensure that all nodes are in the Running state.
  3. Log in to the podMaster node in the cluster, switch to the MRS deployer node, and view the api-gateway.log file.

    1. Run the kubectl get pod -n mrs command to view the pod of the MRS deployer node.
    2. Run the kubectl exec -ti ${Pod of the deployer node} -n mrs /bin/bash command to log in to the pod. For example, run the kubectl exec -ti mrsdeployer-78bc8c76cf-mn9ss -n mrs /bin/bash command to access the deployer container of MRS.
    3. In the /opt/cloud/logs/apigateway directory, view the latest api-gateway.log file and search for the required keyword (such as ERROR, scaling, clusterScaling, HostState, state-check, or the cluster ID) in the file to check the error type.
    4. Rectify the fault based on the error information and perform the scale-out again.
      • If the scale-out is successful, no further action is required.
      • If the scale-out fails, go to 4.

  4. Run the /opt/cloud/mysql -u${Username} -P${Port} -h${Address} -p${Password} command to log in to the database.
  5. Run the select cluster_state from cluster_detail where cluster_id=Cluster ID"; command to check the value of cluster_state.

    • If the value of cluster_state is 2, the cluster status is normal. Go to 6.
    • If the value of cluster_state is not 2, the cluster status in the database is abnormal. You can run the update cluster_detail set cluster_state=2 where cluster_id="Cluster ID"; command to update the cluster status and then check the value of cluster_state.
      • If the value of cluster_state is 2, the cluster status is normal. Go to 6.
      • If the value of cluster_state is not 2, contact technical support.

  6. Run the select host_status from host where cluster_di="Cluster ID"; command to query the cluster host status.

    • If the host is in the started state, no further action is required.
    • If the host is not in the started state, run the update host set host_status='started' where cluster_id="Cluster ID"; command to update the host status to the database.
      • If the host is in the started state, no further action is required.
      • If the host is not in the started state, contact technical support.