Updated on 2024-11-29 GMT+08:00

Scale-Out Failure

Issue

MRS cluster scale-out fails when the console page is normal.

Symptom

The MRS console is normal, and no alarm or error message is displayed on MRS Manager. However, an error message is displayed during cluster scale-out, indicating that the MRS cluster contains nodes that are not running and asking you to try again later.

Cause Analysis

MRS cluster scale-in and scale-out can be performed only when the cluster is running properly. Therefore, you need to check whether the cluster is normal. Currently, a message is displayed indicating that there are nodes that are not running in the cluster. However, the console and MRS Manager pages are normal. Therefore, the possible cause is that the cluster status in the database is abnormal or is not updated. As a result, the nodes in the cluster are not in the normal state, causing the failure.

Procedure

  1. Log in to the MRS management console and click the cluster name to go to the cluster details page. Check the cluster status and ensure that the cluster is in the Running state.
  2. Click Nodes to view the status of all nodes. Ensure that all nodes are in the Running state.
  3. Log in to the podMaster node in the cluster, switch to the deployer node of MRS, and view the api-gateway.log file.

    1. Run the kubectl get pod -n mrs command to view the pod of the deployer node corresponding to MRS.
    2. Run the kubectl exec -ti ${pod of the deployer node} -n mrs /bin/bash command to log in to the corresponding pod. For example, run the kubectl exec -ti mrsdeployer-78bc8c76cf-mn9ss -n mrs /bin/bash command to access the deployer container of MRS.
    3. In the /opt/cloud/logs/apigateway directory, view the latest api-gateway.log file and search for key information (such as ERROR, scaling, clusterScaling, HostState, state-check, or cluster ID) in the file to check the error type.
    4. Rectify the fault based on the error information and perform the scale-out again.
      • If the scale-out is successful, no further action is required.
      • If the scale-out fails, go to 4.

  4. Run the /opt/cloud/mysql -u${Username} -P${Port} -h${Address} -p${Password} command to log in to the database.
  5. Run the select cluster_state from cluster_detail where cluster_id=Cluster ID"; command to check the value of cluster_state.

    • If the value of cluster_state is 2, the cluster status is normal. Go to 6.
    • If the value of cluster_state is not 2, the cluster status in the database is abnormal. You can run the update cluster_detail set cluster_state=2 where cluster_id="Cluster ID"; command to refresh the cluster status and check the value of cluster_state.
      • If the value of cluster_state is 2, the cluster status is normal. Go to 6.
      • If the value of cluster_state is not 2, contact technical support.

  6. Run the select host_status from host where cluster_di="Cluster ID"; command to query the cluster host status.

    • If the host is in the started state, no further action is required.
    • If the host is not in the started state, run the update host set host_status='started' where cluster_id="Cluster ID"; command to update the host status to the database.
      • If the host is in the started state, no further action is required.
      • If the host is not in the started state, contact technical support.