Configuring Password-free SSH Mutual Trust Between Instances for a Training Job Created Using a Custom Image
For distributed training with custom images using MPI or Horovod, set up password-free SSH trust between instances to enable seamless communication. Otherwise, the training will fail.
This involves code adaptation and training job parameter configuration.
- Create a custom image with OpenSSH pre-installed. The training framework should be MPI or Horovod.
- Create a boot script file start_sshd.sh.
   MY_SSHD_PORT=${MY_SSHD_PORT:-"38888"} mkdir -p /home/ma-user/etc ssh-keygen -f /home/ma-user/etc/ssh_host_rsa_key0 -N '' -t rsa > /dev/null /usr/sbin/sshd -p $MY_SSHD_PORT -h /home/ma-user/etc/ssh_host_rsa_key0
- Upload the sshd startup script file to the training code directory in OBS.
- Create a training job using the custom image.
   - Code Directory: Select the OBS path where the sshd boot script file is stored.
- Boot Command: Adapt the boot command to the sshd boot script.
     bash ${MA_JOB_DIR}/demo-code/start_sshd.sh && your custom commandIn the command, your custom command indicates custom commands you want to execute in the training job. 
- Environment Variable: Add MY_SSHD_PORT = 38888.
- Password-free SSH Between Nodes: Enable it and set Password-free SSH File Directory. Use the default value in most cases. After a training job is delivered, the SSH key file and configuration file authorized_keys config id_rsa id_rsa.pub are automatically generated in the /home/ma-user/.ssh directory of the training container.
 
- After a training job is created, its instances can establish an SSH connection with each other by using the domain name and port number throughout the training process. The sample code is as follows:
   ssh modelarts-job-a0978141-1712-4f9b-8a83-000000000000-worker-1 -p $MY_SSHD_PORT 
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot 
    