Updated on 2025-11-18 GMT+08:00

MoXing Functions

Description

When using ModelArts, you may encounter situations where you need to access Object Storage Service (OBS). OBS is an object-based massive storage service. Unlike traditional local file systems, files on OBS cannot be accessed directly through file paths.

When directly accessing OBS files, you may encounter the following issues:

  • You cannot use the open() method as you would with local files.
  • File read and write operations require network requests.
  • Using the OBS Python SDK API can be relatively complex.

How can you more conveniently access and manage OBS files in ModelArts?

ModelArts provides the mox.file API, which offers a simpler solution for OBS operations. Below is an example of using the mox.file API to access OBS files:

# Example of accessing a file from OBS
import moxing as mox

# Open an OBS file.
with mox.file.File('obs://bucket_name/a.txt', 'r') as f:
    print(f.read())

# List OBS directories.
mox.file.list_directory('obs://bucket_name/my_dir/')

Precautions for Using the mox.file API

  1. The mox.file API is primarily designed to enhance the ease of reading and downloading data from OBS.
  2. For some APIs of OBS parallel file systems, there may be compatibility issues. It is recommended to use the OBS Python SDK directly for production business development.
  3. For details about the APIs, see API Overview of OBS SDK for Python.

Through the mox.file API, you can access and manage OBS resources as conveniently as local files, thereby improving development efficiency.

Constraints

  • The OBS bucket you need to access must be accessible by the current training job.
  • You must have the read and write permissions on the OBS bucket.
  • You must use the standard ModelArts image a custom image to install the moxing library.

File Copy

  • Copy a file. mox.file.copy can only be used to perform operations on files. To perform operations on folders, use mox.file.copy_parallel.
    • Copy an OBS file from an OBS path to another OBS path. For example, copy obs://bucket_name/obs_file.txt to obs://bucket_name/obs_file_2.txt.
      1
      2
      import moxing as mox
      mox.file.copy('obs://bucket_name/obs_file.txt', 'obs://bucket_name/obs_file_2.txt')
      
    • Copy an OBS file to a local path, that is, download an OBS file. For example, download obs://bucket_name/obs_file.txt to /tmp/obs_file.txt.
      1
      2
      import moxing as mox
      mox.file.copy('obs://bucket_name/obs_file.txt', '/tmp/obs_file.txt')
      
    • Copy a local file to OBS, that is, upload an OBS file. For example, upload /tmp/obs_file.txt to obs://bucket_name/obs_file.txt.
      1
      2
      import moxing as mox
      mox.file.copy('/tmp/obs_file.txt', 'obs://bucket_name/obs_file.txt')
      
    • Copy a local file to another local path. This operation is equivalent to shutil.copyfile. For example, copy /tmp/obs_file.txt to /tmp/obs_file_2.txt.
      1
      2
      import moxing as mox
      mox.file.copy('/tmp/obs_file.txt', '/tmp/obs_file_2.txt')
      
  • For large files, mox.file.copy will use a segmented concurrent download method by default to speed up the process. The relevant parameters can be controlled through environment variables:

    1. Determine if a file is large: If the file size exceeds this threshold, segmented concurrent download will be enabled.
      MOX_FILE_PARTIAL_MAXIMUM_SIZE

      The default size is 5 GB. The unit is byte. To set the size to 5 GB, enter 5368709120.

    2. Manage the size of file segments:
      MOX_FILE_LARGE_FILE_PART_SIZE

      The default size is 10 MB. The unit is byte. Due to the OBS limit of up to 10,000 segments, the segment size should be increased for files larger than 100 GB.

    3. Manage the number of concurrent download processes. The concurrent download process count determines how many threads are launched when downloading large files. In the new version, the default value is 8. If there are fewer nodes or fewer large files, you can increase the concurrency level appropriately to improve download performance.
      MOX_FILE_LARGE_FILE_TASK_NUM

      The default value is 32. If high concurrency causes OBS bucket throttling, reduce the concurrency number.

  • Copy a folder. mox.file.copy_parallel can only be used to perform operations on folders. To perform operations on files, use mox.file.copy.
    • Copy an OBS file from an OBS path to another OBS path. For example, copy obs://bucket_name/sub_dir_0 to obs://bucket_name/sub_dir_1.
      1
      2
      import moxing as mox
      mox.file.copy_parallel('obs://bucket_name/sub_dir_0', 'obs://bucket_name/sub_dir_1')
      
    • Copy an OBS folder to a local path, that is, download an OBS folder. For example, download obs://bucket_name/sub_dir_0 to /tmp/sub_dir_0.
      1
      2
      import moxing as mox
      mox.file.copy_parallel('obs://bucket_name/sub_dir_0', '/tmp/sub_dir_0')
      
    • Copy a local folder to OBS, that is, upload an OBS folder. For example, upload /tmp/sub_dir_0 to obs://bucket_name/sub_dir_0.
      1
      2
      import moxing as mox
      mox.file.copy_parallel('/tmp/sub_dir_0', 'obs://bucket_name/sub_dir_0')
      
    • Copy a local folder to another local path. This operation is equivalent to shutil.copytree. For example, copy /tmp/sub_dir_0 to /tmp/sub_dir_1.
      1
      2
      import moxing as mox
      mox.file.copy_parallel('/tmp/sub_dir_0', '/tmp/sub_dir_1')
      

mox.file.copy_parallel uses the threads parameter to control the number of concurrent copy operations. The default value is 16.

The file_list parameter specifies the files to be copied. For example, upload /tmp/sub_dir_0/train/1.jpg and /tmp/sub_dir_0/eval/2.jpg in /tmp/sub_dir_0 to obs://bucket_name/sub_dir_0.

1
2
import moxing as mox
mox.file.copy_parallel('/tmp/sub_dir_0', 'obs://bucket_name/sub_dir_0', file_list=['train/1.jpg', 'eval/2.jpg'])

Read/Write

  • Read an OBS file.
    For example, if you read the obs://bucket_name/obs_file.txt file, the content is returned as strings.
    1
    2
    import moxing as mox
    file_str = mox.file.read('obs://bucket_name/obs_file.txt')
    
    Alternatively, open the file object and read data from it.
    1
    2
    3
    import moxing as mox
    with mox.file.File('obs://bucket_name/obs_file.txt', 'r') as f:
      file_str = f.read()
    
  • Read a line from a file. A string that ends with a newline character is returned. You can also open the file object in OBS.
    1
    2
    3
    import moxing as mox
    with mox.file.File('obs://bucket_name/obs_file.txt', 'r') as f:
      file_line = f.readline()
    
  • Read all lines from a file. A list is returned, in which each element is a line and ends with a newline character.
    1
    2
    3
    import moxing as mox
    with mox.file.File('obs://bucket_name/obs_file.txt', 'r') as f:
      file_line_list = f.readlines()
    
  • Read an OBS file in binary mode.
    For example, if you read the obs://bucket_name/obs_file.bin file, the content is returned as bytes.
    1
    2
    import moxing as mox
    file_bytes = mox.file.read('obs://bucket_name/obs_file.bin',binary=True)
    

    Alternatively, open the file object and read data from it.

    1
    2
    3
    import moxing as mox
    with mox.file.File('obs://bucket_name/obs_file.bin', 'rb') as f:
      file_bytes = f.read()
    

    One or all lines in a file opened in binary mode can be read with the same method.

  • Write a string to a file.
    For example, write Hello World! into the obs://bucket_name/obs_file.txt file.
    1
    2
    import moxing as mox
    mox.file.write('obs://bucket_name/obs_file.txt', 'Hello World!')
    

    You can also open the file object and write data into it. Both methods are the same.

    1
    2
    3
    import moxing as mox
    with mox.file.File('obs://bucket_name/obs_file.txt', 'w') as f:
      f.write('Hello World!')
    

    When you open a file in write mode or call mox.file.write, if the file to be written does not exist, the file will be created. If the file to be written already exists, the file is overwritten.

  • Append content to an OBS file.

    For example, append Hello World! to the obs://bucket_name/obs_file.txt file.

    1
    2
    import moxing as mox
    mox.file.append('obs://bucket_name/obs_file.txt', 'Hello World!')
    

    You can also open the file object and append content to it. Both methods are the same.

    1
    2
    3
    import moxing as mox
    with mox.file.File('obs://bucket_name/obs_file.txt', 'a') as f:
      f.write('Hello World!')
    

    When you open a file in append mode or call mox.file.append, if the file to be appended does not exist, the file will be created. If the file to be appended already exists, the content is directly appended.

    If the size of the source file to be appended is large, for example, the obs://bucket_name/obs_file.txt file exceeds 5 MB, the append performance is low.

    If the file object is opened in write or append mode, when the write function is called, the content to be written is temporarily stored in the cache until the file object is closed (the file object is automatically closed when the with statement exits). Alternatively, you can call the close() or flush() function of the file object to write the file content.

List

  • List an OBS directory. Only the top-level result (relative path) is returned. Recursive listing is not performed.

    For example, if you list obs://bucket_name/object_dir, all files and folders in the directory are returned, but recursive queries are not performed.

    Assume that obs://bucket_name/object_dir is in the following structure:

    1
    2
    3
    4
    5
    bucket_name
          |- object_dir
            |- dir0
              |- file00
            |- file1
    

    Call the following code:

    1
    2
    import moxing as mox
    mox.file.list_directory('obs://bucket_name/object_dir')
    

    The following list is returned:

    ['dir0', 'file1']
  • Recursively list an OBS directory. All files and folders (relative paths) in the directory are returned, and recursive queries are performed.

    Assume that obs://bucket_name/object_dir is in the following structure:

    1
    2
    3
    4
    5
    bucket_name
          |- object_dir
            |- dir0
              |- file00
            |- file1
    

    Call the following code:

    1
    2
    import moxing as mox
    mox.file.list_directory('obs://bucket_name/object_dir', recursive=True)
    

    The following list is returned:

    ['dir0', 'dir0/file00', 'file1']

Folder Creation

Create an OBS directory, that is, an OBS folder. Recursive creation is supported. That is, if the sub_dir_0 folder does not exist, it is automatically created. If the sub_dir_0 folder exists, no folder will be created.

1
2
import moxing as mox
mox.file.make_dirs('obs://bucket_name/sub_dir_0/sub_dir_1')

Query

  • Check whether an OBS file exists. If the file exists, True is returned. If the file does not exist, False is returned.
    1
    2
    import moxing as mox
    mox.file.exists('obs://bucket_name/sub_dir_0/file.txt')
    
  • Check whether an OBS folder exists. If the folder exists, True is returned. If the folder does not exist, False is returned.
    1
    2
    import moxing as mox
    mox.file.exists('obs://bucket_name/sub_dir_0/sub_dir_1')
    

    OBS allows files and folders with the same name exist (not allowed in UNIX). If a file or folder with the same name exists, for example, obs://bucket_name/sub_dir_0/abc, when mox.file.exists is called, True is returned regardless of whether abc is a file or folder.

  • Check whether an OBS path is a folder. If it is a folder, True is returned. If it is not a folder, False is returned.
    1
    2
    import moxing as mox
    mox.file.is_directory('obs://bucket_name/sub_dir_0/sub_dir_1')
    

    OBS allows files and folders with the same name exist (not allowed in UNIX). If a file or folder with the same name exists, for example, obs://bucket_name/sub_dir_0/abc, when mox.file.is_directory is called, True is returned.

  • Obtain the size of an OBS file, in bytes.
    For example, obtain the size of obs://bucket_name/obs_file.txt.
    1
    2
    import moxing as mox
    mox.file.get_size('obs://bucket_name/obs_file.txt')
    
  • Recursively obtain the size of all files in an OBS folder, in bytes.
    For example, obtain the total size of all files in the obs://bucket_name/object_dir directory.
    1
    2
    import moxing as mox
    mox.file.get_size('obs://bucket_name/object_dir', recursive=True)
    
  • Obtain the stat information about an OBS file or folder. The stat information contains the following:
    • length: File size.
    • mtime_nsec: Creation timestamp.
    • is_directory: Specifies whether the path is a folder.
    For example, if you want to query the OBS file obs://bucket_name/obs_file.txt, you can replace the file path with a folder path.
    1
    2
    3
    4
    5
    import moxing as mox
    stat = mox.file.stat('obs://bucket_name/obs_file.txt')
    print(stat.length)
    print(stat.mtime_nsec)
    print(stat.is_directory)
    

Deletion

  • Delete an OBS file.
    For example, delete obs://bucket_name/obs_file.txt.
    1
    2
    import moxing as mox
    mox.file.remove('obs://bucket_name/obs_file.txt')
    
  • Delete an OBS folder and recursively delete all content in the folder. If the folder does not exist, an error is reported.
    For example, delete all content in obs://bucket_name/sub_dir_0.
    1
    2
    import moxing as mox
    mox.file.remove('obs://bucket_name/sub_dir_0', recursive=True)
    

Parameter Configuration

  • Use the mox.file.set_auth function to set all configurable parameters.
    # Configure parameters for MoXing reading. Parameters that do not require modification can be left unspecified. This API should be executed globally once and must not be called multiple times.
    # ak – Access Key, string type. When using in ModelArts, the system configures it by default. For other environments, refer to unified identity authentication.
    # sk – Secret Access Key, string type. When using in ModelArts, the system configures it by default. For other environments, refer to unified identity authentication.
    # server – OBS server. When using in ModelArts, the system configures it by default. For other environments, refer to the configuration at https://support.huaweicloud.com/intl/en-us/productdesc-obs/obs_03_0152.html.
    # port – OBS server port. When using in ModelArts, the system configures it by default. For other environments, refer to the server configuration. Generally, a special port number is not provided and does not need to be configured.
    # is_secure – Specifies whether to use HTTPS. Boolean type. The default value is True. 
    # ssl_verify – Specifies whether to use SSL verification. Boolean type. The default value is False. 
    # long_conn_mode – Specifies whether to use long connection mode. Boolean type. The default value is True. 
    # path_style – Specifies whether to use path style. Boolean type. HEC sets it to True by default, and public cloud sets it to False by default. It is generally not recommended to modify this.
    # retry – Total number of attempts. Integer type. Integer type. The default value is 10, measured in times.
    # retry_wait – Time to wait for each attempt. Float type. The default value is 0.1, measured in seconds. If not configured, the first attempt waits for 0.1 seconds, the second failure waits for 0.2 seconds, the third for 0.4 seconds, and so on, exponentially increasing. # client_timeout – OBS client timeout time. Integer type. The default value is 30, measured in seconds. # list_max_keys – Maximum number of objects listed per page, used for the list_directory API. Integer type. The default value is 1000.
    import moxing as mox 
    mox.file.set_auth(ak='xxx', sk='xxx')
  • Delete an OBS folder and recursively delete all content in the folder. If the folder does not exist, an error is reported.
    For example, delete all content in obs://bucket_name/sub_dir_0.
    1
    2
    import moxing as mox
    mox.file.remove('obs://bucket_name/sub_dir_0', recursive=True)