Updated on 2024-06-11 GMT+08:00

Failed to Correctly Read Files

Symptom

  • How to read the json and npy files when creating a training job.
  • How the training job uses the cv2 library to read files.
  • How to use the torch package in the MXNet environment.
  • The following error occurs when the training job reads the file:
    NotFoundError (see above for traceback): Unsucessful TensorSliceReader constructor: Failed to find any matching files for xxx://xxx

Possible Cause

In ModelArts, user's data is stored in OBS buckets, but training jobs are running in containers. Therefore, users cannot access files in OBS buckets by accessing local paths.

Solution

If an error occurs when you read a file, you can use MoXing to copy data to a container and then access the data in the container. For details, see 1.

You can also read files based on the file type. For details, see Reading .json files, Reading .npy files, and Using the cv2 library to read files, and Using the torch package in the MXNet environment.

  1. If an error occurs when you read a file, you can use MoXing to copy data to a container and then access the data in the container as follows:
    import moxing as mox
    mox.file.make_dirs('/cache/data_url')
    mox.file.copy_parallel('obs://bucket-name/data_url', '/cache/data_url')
  2. To read .json files, run the following code:
    json.loads(mox.file.read(json_path, binary=True))
  3. To use numpy.load to read .npy files, run the following code:
    • Using the MoXing API to read files from OBS
      np.load(mox.file.read(_SAMPLE_PATHS['rgb'], binary=True))
    • Using the file module of MoXing to read and write OBS files
      with mox.file.File(_SAMPLE_PATHS['rgb'], 'rb') as f:
      np.load(f)
  4. To use the cv2 library to read files, run the following code:
    cv2.imdecode(np.fromstring(mox.file.read(img_path), np.uint8), 1)
  5. To use the torch package in the MXNet environment, run the following code:
    import os
    os.sysytem('pip install torch')
    import torch