Authorizing the Repair of Lite Server Nodes
Scenario
If hardware maintenance is required for a Lite Server node due to an unrecoverable fault, a scheduled event will be pushed to the event center of the console. In the event center, you can view the event information, type, status, and description. You can also authorize Huawei technical support to perform O&M on the faulty node or redeploy the node.
Event Type |
Event Status |
Supported Operations |
Applicable Resource Type |
Description |
---|---|---|---|---|
System maintenance |
Authorization Pending |
Authorization and redeployment |
Snt9b |
System maintenance is to authorize Huawei technical support to systematically maintain the faulty node. |
Local disk recovery |
Authorization Pending |
Authorization and redeployment |
Snt9b |
Local disk recovery is to authorize Huawei technical support to maintain the faulty local disk.
WARNING:
After authorization, recovering the local disk will cause local supernode disk loss. Therefore, migrate services and back up data before authorization. |
Supernode maintenance |
Authorization Pending |
Authorization |
Snt9b23 |
Supernode maintenance is to authorize Huawei technical support to recover faulty nodes by manually repairing or replacing components. |
Supernode redeployment |
Authorization Pending |
Authorization |
Snt9b23 |
Supernode redeployment is to authorize the Huawei O&M system to recover faulty nodes by automatically replacing nodes. After the recovery, the node name, node ID, and IP address remain unchanged except the physical device information. |
Supernode local disk recovery |
Authorization Pending |
Authorization |
Snt9b23 |
Supernode local disk recovery is to authorize Huawei technical support to restore the local disk of the supernode.
WARNING:
After authorization, recovering the local disk will cause local supernode disk loss. Therefore, migrate services and back up data before authorization. |
- Authorization: Authorize Huawei technical support to repair the hardware of the faulty node one by one, which takes a long time.
- Redeployment: Authorize Huawei technical support to replace the faulty node with a new one, which is fast, but local disk data will be lost after the redeployment. Exercise caution. Moreover, migrate services and back up data before redeployment.
Constraints
- Only Ascend Snt9b and Snt9b23 support hardware maintenance through scheduled events.
- Redeployment of supernodes must be performed within physical supernodes. If there are 48 supernodes, redeployment is not supported and the authorization button becomes unavailable.
- If the planned event does not meet the requirements listed in Table 1, the Authorize button becomes unavailable.
- Before authorizing a supernode redeployment event, you need to stop the server instance on the Lite Servers page. Otherwise, the authorization fails. After the event is executed, restart the server instance.
- Authorizing a node will affect services running on it. The authorization operation can be performed only when the event type is Supernode Redeployment and the node is shut down.
- After the local node disk and supernode disk are restored, the local disk data will be lost. Therefore, migrate services and backup data before authorization. After the local disk is restored, log in to the Lite Server node to partition the local disk.
Viewing Scheduled Events
Log in to the ModelArts console. In the navigation pane on the left, click Event Center under Resource Management. You can view the event details on the displayed page. By default, events in the Authorization Pending, Authorized, and Executing states are displayed. You can remove the filter criteria to view events in all states.
Attribute |
Description |
Example |
---|---|---|
Event ID |
Unique event ID. |
5ad1df12-e3d2-4f36-b367-xxxxxxxxxxxx |
Node Name/ID |
Name and ID of the server node that initiates the event. |
devserver-dd50 1e0d95ad-5a9f-46e3-9ba6-c5f8fcxxxx |
Event Type |
For details about the event types, see Table 1. |
Supernode Redeployment |
Event Status |
|
Authorization Pending |
Event Description |
Cause of the event. |
Underlying hardware fault. alarmName=XXXX,bmcip=2409:27ff:1003:0103:0011:0000:0000:xxxx,componentName=XXXX is automatically connected through CAR. |
Obtained At |
Event creation time |
2025/02/19 16:05:32 GMT+08:00 |
Executed |
Time when an event enters the scheduling and execution phase |
2025/03/03 16:23:16 GMT+08:00 |
Operation |
Authorize: Authorizing a node will affect services running on it. The authorization operation can be performed only when the event type is Supernode Redeployment and the node is shut down.
NOTE:
Redeployment of supernodes must be performed within physical supernodes. If there are 48 supernodes, redeployment is not supported and the authorization button becomes unavailable. |
-- |
Authorization Operations
If the faulty nodes meet the requirements listed in Table 1, you can authorize Huawei technical support to perform O&M on the faulty nodes.
To do so, log in to the ModelArts console. In the navigation pane on the left, choose Event Center. Locate the target node and click Authorize in the Operation column. In the displayed dialog box, click OK. The following steps describe how to authorize Huawei technical support to perform O&M on a supernode.
- Log in to the ModelArts console. In the navigation pane on the left, choose Event Center. On the displayed Event Center page, view events whose Event Type is Supernode maintenance and click Authorize.
- The supernode maintenance event enters the Authorized state.
- After the supernode is repaired, the event status is Completed. In this case, the node is available.
After the O&M, Huawei technical support will disable the authorization. You do not need perform any operation.
For local disk and supernode disk restoration, you need to log in to the Lite Server node to partition the local disk afterwards.
Redeployment Operations
If the faulty node meets the redeployment conditions described in Table 1, log in to the ModelArts console. In the navigation pane on the left, choose Event Center under Resource Management. Locate the target node and click Redeploy in the Operation column. In the displayed dialog box, enter YES and click OK.
After the redeployment, the data on the local disk will be lost. Exercise caution. Migrate services and back up data before redeployment.
If the planned event does not meet the requirements listed in Table 2, the Redeploy button becomes unavailable.
After the O&M, Huawei technical support will disable the authorization. You do not need perform any operation.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot