When MXNet Creates kvstore, the Program Is Blocked and No Error Is Reported
Symptom
When kv_store = mxnet.kv.create('dist_async') is used to create kvstore, the program is blocked. For example, run the following code. If end is not displayed, the program is blocked.
print('start') kv_store = mxnet.kv.create('dist_async') print('end')
Possible Cause
The possible cause of a worker block is that the server cannot be connected.
Solution
Place the following code before import mxnet in Boot File to check the communication status between nodes. In addition, ps can be resent.
import os os.environ['PS_VERBOSE'] = '2' os.environ['PS_RESEND'] = '1'
In the preceding code, os.environ['PS_VERBOSE'] = '2' indicates that all communication information is printed. os.environ['PS_RESEND'] = '1' indicates that the Van instance resends the message if it does not receive the ACK message within the milliseconds set by PS_RESEND_TIMEOUT.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.