High-Risk Operations
When you perform operations on ModelArts Lite Cluster resources on the CCE, ECS, or BMS console, certain resource pool functions may be abnormal. The table below shows common risky operations.
Risky operations fall into three levels:
- High: Such operations may cause service failures, data loss, system maintenance failures, and system resource exhaustion.
- Medium: Such operations may cause security risks and reduce service reliability.
- Low: Such operations include high-risk operations other than those of a high or medium risk level.
Object |
Operation |
Risk |
Severity |
Solution |
---|---|---|---|---|
Cluster |
Upgrade, modify, hibernate, or delete clusters. |
These operations may impact basic ModelArts functions, including resource pool management, node management, scaling, and driver upgrades |
High |
These operations cannot be undone. |
Node |
Unsubscribe, remove, shut down, manage taints, or switch or reinstall OS. |
These operations may impact basic ModelArts functions, including node management, scaling, driver upgrades, and data loss of local disks. |
High |
These operations cannot be undone. |
Modify a network security group. |
These operations may impact basic ModelArts functions, including node management, scaling, and driver upgrades |
Medium |
If needed, revert back to the original data. |
|
Network |
Modify or delete the CIDR block associated with a cluster. |
These operations impact basic ModelArts functions, including node management, scaling, and driver upgrades |
High |
These operations cannot be undone. |
Plug-in |
Upgrade or uninstall the gpu-beta plug-in. |
The GPU driver may be abnormal. |
Medium |
Roll back the version and reinstall the plug-in. |
Upgrade or uninstall the huawei-npu plug-in. |
The NPU driver may be abnormal. |
Medium |
Roll back the version and reinstall the plug-in. |
|
Upgrade or uninstall the volcano plug-in. |
Job scheduling may be abnormal. |
Medium |
Roll back the version and reinstall the plug-in. |
|
Uninstall the ICAgent plug-in. |
Logging and monitoring may be abnormal. |
Medium |
Roll back the version and reinstall the plug-in. |
|
helm |
Upgrade, roll back, or uninstall os-node-agent. |
Driver upgrades, fault detection, metric collection, and node O&M are abnormal. |
High |
Contact Huawei Cloud technical support to reinstall os-node-agent. |
Upgrade, roll back, or uninstall rdma-sriov-dev-plugin. |
The use of RDMA NICs in containers may be affected. |
High |
Contact Huawei Cloud technical support to reinstall rdma-sriov-dev-plugin. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot