(Optional) Configuring the Driver
Configure the corresponding driver to ensure proper use of GPU/Ascend resources in nodes within a dedicated resource pool to meet service requirements.
Lite Cluster supports two driver configuration methods:
- Method 1: Configuring a Custom Driver When Purchasing a Resource Pool: Some GPU and Ascend resource pools allow custom drivers. Enable Custom Driver and select the required driver version.
- Method 2: Upgrading the Existing Resource Pool Driver: If no custom driver is configured and the default driver does not meet service requirements, upgrade the default driver to the required version.
Method 1: Configuring a Custom Driver When Purchasing a Resource Pool
- Log in to the ModelArts console. In the navigation pane on the left, choose Standard Cluster under Resource Management.
- On the Standard Cluster page, click Buy Standard Cluster, and configure the parameters on the displayed page.
Some GPU and Ascend resource pools allow custom driver installation. When configuring resource allocation, enable Custom Driver. Select the required GPU/Ascend driver from the drop-down list. For details about gpu-driver mapping versions, see Software Versions Required by Different Models.
Figure 1 GPU/Ascend driverFor details about parameters, see Enabling Lite Cluster Resources.
- Click Buy Now and confirm the specifications. Confirm the information and click Submit.
Method 2: Upgrading the Existing Resource Pool Driver
If no custom driver is configured and the default driver does not meet service requirements, upgrade the default driver to the required version.
- The target Lite Cluster resource pool must be running and contains GPU or Ascend resources.
- To perform the upgrade, you need to restart the node, which is recommended to be performed during off-peak hours to avoid affecting running tasks. You can view the node usage on the Node Management page of the resource pool details page.
Upgrading the driver will restart the node, which may result in the loss of any customized configurations made on the host.
- Log in to the ModelArts console. In the navigation pane on the left, choose Lite Cluster under Resource Management. In the resource pool list, locate the target resource pool, and choose
> Upgrade Driver.
Alternatively, click the resource pool name in the list to access its details page. In the navigation pane on the left, choose Node Pool Management. Locate the target node pool and choose More > Upgrade Driver in the Operation column.
- In the displayed dialog box, you can view the driver type, number of instances, current version, target version, upgrade mode, upgrade scope, and rolling switch of the Lite Cluster resource pool.
Set the parameters by referring to Table 2.
- Click OK to start the driver upgrade.
In the resource pool list, locate the target resource pool, and choose
> Upgrade Driver. On the displayed page, check whether the current version is the target version. If yes, the driver is upgraded.
For details, see Upgrading the Lite Cluster Resource Pool Driver.
Follow-Up Operation
(Optional) Configuring Image Pre-provisioning: Lite Cluster resource pools enable image pre-provisioning, which pulls images from nodes in the pools beforehand, accelerating image pulling during inference and large-scale distributed training.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot