Help Center/ ModelArts/ Troubleshooting/ Lite Cluster/ A Reset Node Cannot Be Used
Updated on 2025-08-22 GMT+08:00

A Reset Node Cannot Be Used

Symptom

If the CCE cluster of ModelArts Lite has only one node in the resource pool and Volcano is set as the default scheduler, the node cannot be used after being reset on ModelArts. As a result, pods on the node fail to be scheduled.

Possible Causes

After a node is reset on ModelArts, modelarts-os adds an admission taint to the node for node admission. However, Volcano in the cluster does not support taint tolerance and there is only one node in the cluster. As a result, Volcano cannot be started, the maos-node-agent container that manages taints on the modelarts-os node cannot be started, and the taint cannot be automatically cleared.

Solution

  • (Recommended) Solution 1 (using the Volcano scheduler as required):
    1. Change the default scheduler to kube-scheduler on the CCE console.
    2. Delete the pod of maos-node-agent (restart the pod).
    3. Delete taint A200008 from the node on the CCE console.
    4. Reset the node on the ModelArts console.

    Disadvantage: When creating a workload, you need to manually specify Volcano as the scheduler. For details, see the user guide.

  • Solution 2 (Volcano scheduler used by default):
    1. Change the default scheduler to kube-scheduler in the configuration center on the CCE console.
    2. Delete the pod of maos-node-agent (restart the pod).
    3. Delete taint A200008 from the node on the CCE console.
    4. Reset the node on the ModelArts console.
    5. Change the default scheduler to volcano in the configuration center on the CCE console.

    Disadvantage: If you perform operations on the node on ModelArts, such as resetting or upgrading the driver, the node may fail to be started.