Cette page n'est pas encore disponible dans votre langue. Nous nous efforçons d'ajouter d'autres langues. Nous vous remercions de votre compréhension.

On this page
Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ Service Code Issues/ When MXNet Creates kvstore, the Program Is Blocked and No Error Is Reported

When MXNet Creates kvstore, the Program Is Blocked and No Error Is Reported

Updated on 2024-06-11 GMT+08:00

Symptom

When kv_store = mxnet.kv.create('dist_async') is used to create kvstore, the program is blocked. For example, run the following code. If end is not displayed, the program is blocked.

print('start')
kv_store = mxnet.kv.create('dist_async')
print('end')

Possible Cause

The possible cause of a worker block is that the server cannot be connected.

Solution

Place the following code before import mxnet in Boot File to check the communication status between nodes. In addition, ps can be resent.

import os
os.environ['PS_VERBOSE'] = '2'
os.environ['PS_RESEND'] = '1'

In the preceding code, os.environ['PS_VERBOSE'] = '2' indicates that all communication information is printed. os.environ['PS_RESEND'] = '1' indicates that the Van instance resends the message if it does not receive the ACK message within the milliseconds set by PS_RESEND_TIMEOUT.

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback