Error Message "no socket interface found" Displayed in Logs
Symptom
import os os.environ["NCCL_DEBUG"] = "INFO"
The following error message is displayed.
Possible Causes
The environment variables NCCL_IB_TC, NCCL_IB_GID_INDEX, and NCCL_IB_TIMEOUT are not configured. As a result, the communication is slow and unstable, and the IB communication is interrupted.
Solution
Add environment variables to the code.
import os os.environ["NCCL_IB_TC"] = "128" os.environ["NCCL_IB_GID_INDEX"] = "3" os.environ["NCCL_IB_TIMEOUT"] = "22"
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.