Help Center/ ModelArts/ Troubleshooting/ Training Jobs/ In-Cloud Migration Adaptation Issues/ Error Message "RuntimeError: std::exception" Displayed for a PyTorch 1.0 Engine
Updated on 2024-06-11 GMT+08:00

Error Message "RuntimeError: std::exception" Displayed for a PyTorch 1.0 Engine

Symptom

When a PyTorch 1.0 image is used, the following error message is displayed:
"RuntimeError: std::exception"

Possible Causes

The soft link of libmkldnn in the PyTorch 1.0 image conflicts with that of the native Torch. For details, see conv1d fails in PyTorch 1.0.

Solution

  1. This issue is caused by library conflict in the environment. To resolve this issue, add the following code at the very beginning of the boot script:
    import os
    os.system("rm /home/work/anaconda3/lib/libmkldnn.so")
    os.system("rm /home/work/anaconda3/lib/libmkldnn.so.0")
  2. Use the local PyCharm to remotely access notebook for debugging.

Summary and Suggestions

Before creating a training job, use the ModelArts development environment to debug the training code to maximally eliminate errors in code migration.