Help Center/ ModelArts/ FAQs/ Training Jobs/ Reading Data During Training/ How Do I Improve Training Efficiency While Reducing Interaction with OBS?

Updated on 2025-09-08 GMT+08:00

View PDF

How Do I Improve Training Efficiency While Reducing Interaction with OBS?

Scenario Description

When you use ModelArts for custom deep learning training, training data is typically stored in OBS. If the volume of training data is large (for example, greater than 200 GB), a GPU resource pool is required, and the training efficiency is low.

To improve training efficiency while reducing interaction with OBS, perform the following operations for optimization.

Optimization Principles

For the GPU resource pool provided by ModelArts, 500 GB NVMe SSDs are attached to each training node for free. The SSDs are attached to the /cache directory. The lifecycle of data in the /cache directory is the same as that of a training job. After the training job is complete, all content in the /cache directory is cleared to release space for the next training job. Therefore, you can copy data from OBS to the /cache directory during training so that data can be read from the /cache directory until the training is finished. After the training is complete, content in the /cache directory will be automatically cleared.

Optimization Methods

TensorFlow code is used as an example.

The following is code before optimization:

    
         ...
tf.flags.DEFINE_string('data_url', '', 'dataset directory.')
FLAGS = tf.flags.FLAGS
mnist = input_data.read_data_sets(FLAGS.data_url, one_hot=True)

The following is an example of the optimized code. Data is copied to the /cache directory.

    
         ...
tf.flags.DEFINE_string('data_url', '', 'dataset directory.')
FLAGS = tf.flags.FLAGS
import moxing as mox
TMP_CACHE_PATH = '/cache/data'
mox.file.copy_parallel('FLAGS.data_url', TMP_CACHE_PATH)
mnist = input_data.read_data_sets(TMP_CACHE_PATH, one_hot=True)

Parent topic: Reading Data During Training

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel