How Do I Drain NPU Nodes After Upgrading the CCE AI Suite (Ascend NPU) Add-on?
Symptom
When there are NPU virtualization workloads on an NPU node, upgrading the CCE AI Suite (Ascend NPU) add-on may cause the flexnpu-server component to fail the upgrade. To ensure smooth operation of the add-on, a drainage operation must be carried out on the NPU node to clear the NPU virtualization workloads. You are advised to follow the rolling drainage policy, which involves draining only one or a few NPU nodes at a time to avoid disrupting services on a large scale.
Solution
When draining a NPU node, make sure to reserve enough NPU resources on other nodes for pod scheduling needs. This helps avoid pod scheduling issues due to inadequate resources and ensures smooth service operation.
- Log in to the CCE console and click the cluster name to access the cluster Overview page.
- In the navigation pane, choose Cluster > Nodes. In the right pane, click the Nodes tab, locate the row containing the target NPU node, and choose More > Drain Node in the Operation column. Figure 1 Draining a node
- In the Drain Node dialog box displayed, click OK. If there are pods with emptyDir volumes mounted or pods that are not managed by controllers on the node, forcible drainage will cause data loss. Back up data before enabling forcible drainage. If Drained is displayed in the node status column, the NPU virtualization workloads have been evicted from the NPU node. Figure 2 Configuring node drainage

- In the node list, locate the row containing the NPU node and choose More > Pods in the Operation column. In the flexnpu-server-xxx pod list, if the pod creation time is the current time and the pod status is Running, the node upgrade is complete.
- In the node list, locate the row containing the NPU node and choose More > Enable Scheduling in the Operation column. You can continue repeating the previous steps until all NPU nodes are drained.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot