Configuring Password-free SSH Mutual Trust Between Nodes for a Training Job Created Using a Custom Image
If you use a custom image based on the MPI or Horovod framework for distributed training, you must configure password-free SSH mutual trust between training job nodes. Otherwise, the training will fail.
This involves code adaptation and training job parameter configuration.
- Create a custom image with OpenSSH pre-installed. The training framework should be MPI or Horovod.
- Create a boot script file start_sshd.sh.
MY_SSHD_PORT=${MY_SSHD_PORT:-"38888"} mkdir -p /home/ma-user/etc ssh-keygen -f /home/ma-user/etc/ssh_host_rsa_key0 -N '' -t rsa > /dev/null /usr/sbin/sshd -p $MY_SSHD_PORT -h /home/ma-user/etc/ssh_host_rsa_key0
- Upload the sshd startup script file to the training code directory in OBS.
- Create a training job using the custom image.
- Code Directory: Select the OBS path where the sshd boot script file is stored.
- Boot Command: Adapt the boot command to the sshd boot script.
bash ${MA_JOB_DIR}/demo-code/start_sshd.sh && your custom command
In the command, your custom command indicates custom commands you want to execute in the training job.
- Environment Variable: Add MY_SSHD_PORT = 38888.
- training_ssh_configure_nodes: Enable it and configure the SSH key directory. Retain default settings unless you have specific needs. After a training job is delivered, the SSH key file and configuration file authorized_keys config id_rsa id_rsa.pub are automatically generated in the /home/ma-user/.ssh directory of the training container.
- After a training job is created, its nodes can establish an SSH connection with each other by using the domain name and port number throughout the training process. The sample code is as follows:
ssh modelarts-job-a0978141-1712-4f9b-8a83-000000000000-worker-1 -p $MY_SSHD_PORT
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot