Preparing the Lite Server Environment
Purchasing Lite Server Resources
- Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.
- For distributed training on multiple servers, purchase mountable storage disks accessible by the servers. Both Scalable File Service (SFS) and Elastic Volume Service (EVS) can be mounted to Lite Servers. The section below describes the SFS solution. For details about the EVS solution, see Configuring the Storage.
- Ensure that the container connects to the public network for Git clone's internet access during installation. Mount an EIP to the resources. For details, see Configuring the Network.
If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.
(Optional) Interconnecting with SFS
If you use SFS for storage, SFS Turbo file systems are recommended. SFS Turbo provides high-performance file storage on demand. It features high reliability and availability. It can be elastically expanded and performs better as its capacity grows. The service is suitable for a wide range of scenarios.
- Create a file system on the SFS console. For details, see Creating an SFS Turbo File System. File systems and ECSs in different AZs of the same region can communicate with each other. Therefore, ensure that SFS Turbo and the server are in the same region.
- Mount the created file system to the server. For details, see Mounting an NFS File System to ECSs (Linux).
- Set automatic mounting upon restart on the server to prevent mounting loss. For details, see Mounting a File System Automatically.
Snt9b23 nodes lack local hard drives. When buying these nodes, acquire EVS disks through the ModelArts console. For storing model weight files, consider using an SFS Turbo file system.
Choose an SFS Turbo file system that offers at least 500 MB/s/TiB. Ensure its capacity is 1.2 TB or a whole number multiple of 1.2 TB.
The formula for estimating the SFS bandwidth is as follows:
Bandwidth (MB/s) ≈ (Weights for storing the optimizer state x 1,024) x Multiplication coefficient/Storage duration
The multiplication coefficient usually falls between 6 and 8. For optimal resource allocation and smooth performance, 8 is recommended.
The calculation example is as follows:
If the weights for saving the optimizer state is 200 GB and the recommended storage duration is 20 minutes, the required bandwidth is:
(200 GB x 1,024 x 8)/1,200s = 1,365 MB/s
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot