High-Risk Operations

To avoid adverse impacts on ModelArts Lite Server, you must perform high-risk operations according to operation guides during the routine O&M.

Risky operations fall into three levels:

High: Such operations may cause service failures, data loss, system maintenance failures, and system resource exhaustion.
Medium: Such operations may cause security risks and reduce service reliability.
Low: Such operations include high-risk operations other than those of a high or medium risk level.

**Table 1** High-risk operations
Object	Operation	Risk	Severity	Solution
OS	Upgrade or modify the OS kernel.	The driver and kernel versions may not be compatible. As a result, the OS cannot be started or basic functions are unavailable. High-risk commands, such as apt-get upgrade (upgrading all software in the system, including the kernel), are involved. Run the uname -a command to view the current kernel.	High	To perform upgrade or modification, contact Huawei Cloud technical support.
	Switch or reset OS.	The EVS system ID is changed. As a result, the EVS system disk cannot be scaled out, and message "The order is expired. The capacity cannot be expanded. Renew the order." is displayed.	Low	Mount an EVS or SFS disk for capacity expansion after you switch or reset the OS.
	When the cloud server service is running properly, the user deletes the NIC route in the system or performs network destruction operations, such as running ifconfig down and ifconfig up, on the NIC.	The network service will be restarted and DHCP will be triggered to obtain the IP address and route again. As a result, the NIC route may be lost and the node may be unavailable.	High	Reset the OS. Ensure that your data has been backed up.
	Modify kernel parameters such as net.ipv4.ip_forward.	The route forwarding function of the ECS may be affected, causing network disconnection.	Medium	Set net.ipv4.ip_forward to 1.
	Enable the system firewall.	The performance of HCCL, NCCL, and multi-node multi-PU training tasks may be affected.	Low	Disable the firewall.
	Change the time zone.	The node time changes, which will affect services.	Medium	Restore the time zone.
Driver and firmware	Upgrade the NPU driver or firmware.	The driver and firmware may not match, causing unavailable servers and affecting services.	Medium	Reset the OS. Ensure that your data has been backed up.
	Change the GPU driver.	The driver and firmware may not match, causing unavailable servers and affecting services.	Medium	Reset the OS. Ensure that your data has been backed up.
	Change the SDI PU driver.	The NIC may be unavailable, causing unavailable servers and affecting services.	Medium	Reset the OS. Ensure that your data has been backed up.
Network	Change the NIC MAC address or IP address.	If misoperations are performed, the VM communication and services are interrupted, and other services are affected.	High	Roll back the modification. If the rollback fails, reset the OS. Ensure that your data has been backed up.
Network	Add, delete, or edit iptables rules, or restart the iptables service.	Service access requests are rejected.	High	Roll back the modification. If the rollback fails, reset the OS. Ensure that your data has been backed up.
Built-in OS software	Upgrade, downgrade, or uninstall built-in OS software such as Python 3.	Network configuration software, such as the system built-in network, may be abnormal. As a result, the server NIC fails to be configured and the node is unavailable.	High	Roll back the modification. If the rollback fails, reset the OS. Ensure that your data has been backed up.
Directory/File	Modify key system directories and files of root or opt, such as /etc/hccn.conf and /etc/netplan/roce.yaml.	The system functions may be affected, and the cloud server may be unavailable.	High	Roll back the modification. If the rollback fails, reset the OS. Ensure that your data has been backed up.
Directory/File	Modify the permissions of directories and files.	The service may be abnormal.	High	Roll back the modification.
Server	Do not perform non-query operations on the server, such as stopping or starting the server, when the server instance is being provisioned, initialized, or when disks are being added, deleted, or the instance is being deleted.	Operations on the cloud server may fail.	Medium	Reset the OS. Ensure that your data has been backed up.
Server	Switch or reset OS.	The EVS system ID is changed. As a result, the EVS system disk cannot be scaled out, and message "The order is expired. The capacity cannot be expanded. Renew the order." is displayed.	Low	Mount an EVS or SFS disk for capacity expansion.
Process	Run the service network restart command. Stop key system processes, such as sshd ces-agent.	Services may fail to be provisioned, the remote access to the cloud server may fail. Moreover, data may fail to be collected, affecting the reporting of monitoring indicators.	High	Restart the closed service.
Data disk	Modify the data disk mounting mode and mount point.	Services that are being used may become abnormal.	Low	Ensure that the data disk is not used by any service.
Security group	Modify the port communication protocol. Allow high-risk ports such as port 22. IP address whitelist not configured.	The network may be attacked, affecting services of the server.	Medium	Restore the original content.