Help Center> ModelArts> Best Practices> Obtaining the Sample Dataset of ModelArts

Obtaining the Sample Dataset of ModelArts

ModelArts provides multiple samples based on various AI engines for beginners. For details about the samples, see the ModelArts Best Practices. For each sample, ModelArts has stored the sample dataset in the public OBS bucket. You can select the OBS path based on your region to obtain the sample dataset.

For details about the storage information about each sample dataset, see Sample Dataset Storage Path. You can use different methods to copy the sample dataset to your OBS bucket based on the region where your OBS bucket resides. For details about the methods, see Figure 1. If the dataset is large, you are advised to use method 2 or 3 to copy it. If the dataset is small, method 1 is recommended.

Figure 1 Copying the sample dataset to your OBS bucket

Sample Dataset Storage Path

When the sample dataset is stored, two formats are available: compressed and decompressed files. The data of both formats is the same.

  • Compressed file: After downloading the compressed file, upload it to your OBS bucket. You need to decompress the file before using it. However, it is convenient to download the file.
  • Decompressed file: You can directly copy the decompressed file to your OBS bucket. That is, you can copy an OBS bucket to another OBS bucket. The prerequisite is that your OBS bucket and the OBS bucket of the sample dataset belong to the same region.
Table 1 Sample dataset storage path

Sample Name

Dataset Format

Region

OBS Path

Sample

Yunbao Detection

Compressed file

CN North-Beijing1

https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Yunbao-Data-Custom/archiver/Yunbao-Data-Custom.zip

HUAWEI CLOUD Mascot Detection (Using ExeML for Object Detection)

CN North-Beijing4

https://modelarts-cnnorth4-market-dataset.obs.cn-north-4.myhuaweicloud.com/dataset-market/Yunbao-Data-Custom/archiver/Yunbao-Data-Custom.zip

Decompressed file

CN North-Beijing1

obs://modelarts-cnnorth1-market-dataset/dataset-market/Yunbao-Data-Custom/unarchiver

CN North-Beijing4

obs://modelarts-cnnorth4-market-dataset/dataset-market/Yunbao-Data-Custom/unarchiver

Flower Recognition

Compressed file

CN North-Beijing1

https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Flowers-Data-Set/archiver/Flowers-Data-Set.zip

Flower Recognition (Using a Built-in Algorithm in Training Management for Image Classification)

CN North-Beijing4

https://modelarts-cnnorth4-market-dataset.obs.cn-north-4.myhuaweicloud.com/dataset-market/Flowers-Data-Set/archiver/Flowers-Data-Set.zip

Decompressed file

CN North-Beijing1

obs://modelarts-cnnorth1-market-dataset/dataset-market/Flowers-Data-Set/unarchiver

CN North-Beijing4

obs://modelarts-cnnorth4-market-dataset/dataset-market/Flowers-Data-Set/unarchiver

Iceberg Detection

Compressed file

CN North-Beijing1

https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Iceberg-Data-Set/archiver/Iceberg-Data-Set.zip

Iceberg Detection (Using the MoXing Framework for Image Classification)

CN North-Beijing4

https://modelarts-cnnorth4-market-dataset.obs.cn-north-4.myhuaweicloud.com/dataset-market/Iceberg-Data-Set/archiver/Iceberg-Data-Set.zip

Decompressed file

CN North-Beijing1

obs://modelarts-cnnorth1-market-dataset/dataset-market/Iceberg-Data-Set/unarchiver

CN North-Beijing4

obs://modelarts-cnnorth4-market-dataset/dataset-market/Iceberg-Data-Set/unarchiver

Handwritten Digit Recognition

Compressed file

CN North-Beijing1

https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Mnist-Data-Set/archiver/Mnist-Data-Set.zip

Use MoXing to Develop Training Scripts for Handwritten Digit Recognition

Using a Notebook for Handwritten Digit Recognition

Using MXNet for Handwritten Digit Recognition

Using TensorFlow for Handwritten Digit Recognition

Using Caffe for Handwritten Digit Recognition

CN North-Beijing4

https://modelarts-cnnorth4-market-dataset.obs.cn-north-4.myhuaweicloud.com/dataset-market/Mnist-Data-Set/archiver/Mnist-Data-Set.zip

Decompressed file

CN North-Beijing1

obs://modelarts-cnnorth1-market-dataset/dataset-market/Mnist-Data-Set/unarchiver

CN North-Beijing4

obs://modelarts-cnnorth4-market-dataset/dataset-market/Mnist-Data-Set/unarchiver

Caltech Image Recognition

Compressed file

CN North-Beijing1

https://modelarts-cnnorth1-market-dataset.obs.cn-north-1.myhuaweicloud.com/dataset-market/Caltech101-data-set/archiver/Caltech101-data-set.zip

Using MXNet for Caltech Image Recognition

CN North-Beijing4

https://modelarts-cnnorth4-market-dataset.obs.cn-north-4.myhuaweicloud.com/dataset-market/Caltech101-data-set/archiver/Caltech101-data-set.zip

Decompressed file

CN North-Beijing1

obs://modelarts-cnnorth1-market-dataset/dataset-market/Caltech101-data-set/unarchiver

CN North-Beijing4

obs://modelarts-cnnorth4-market-dataset/dataset-market/Caltech101-data-set/unarchiver

Method 1: Download and Upload Files

There is no specific restriction on the region. You can select any region to download the dataset. To improve operation efficiency, you are advised to download the compressed dataset. However, the download and upload speeds depend on your local network conditions.

Figure 2 Operations of method 1
  1. Select the storage path of the target sample dataset. You can select a dataset in an OBS bucket in any region. You are advised to download the compressed dataset file. Click the link to download the sample dataset to the local PC.

    For example, if you click the link for downloading the Yunbao-Data-Custom.zip dataset for Yunbao Detection in the CN North-Beijing1 region, the Yunbao-Data-Custom.zip file is downloaded to the local PC.

  2. Decompress the obtained file and upload all folders of the dataset to the OBS directory.
    1. First, create an OBS bucket and a folder for storing the sample dataset.

      For example, create an OBS bucket named test-modelarts and a folder named dataset-yunbao.

    2. Decompress the Yunbao-Data-Custom.zip file to the Yunbao-Data-Custom directory on the local PC.
    3. Upload all files in the Yunbao-Data-Custom directory to the test-modelarts/dataset-yunbao directory on OBS. For details about how to upload files, see Uploading a File.

Method 2: Use the MoXing API to Copy a Dataset from the Public Bucket to Your OBS Bucket

The sample dataset must be in the same region as your OBS bucket and you are familiar with notebook operations and ModelArts MoXing. You can copy the sample dataset from the public bucket to your OBS bucket.

You are advised to obtain the OBS path (in OBS format) of the desired decompressed dataset listed in Table 1, create a notebook instance in ModelArts, and copy the dataset to your OBS bucket.

  1. Access the ModelArts management console, create a notebook instance, and create a file on the Jupyter page.
  2. Click the new file to access the development environment.
  3. Check whether the public bucket where the sample dataset resides is accessible.

    For example, obtain the sample dataset of Yunbao Detection in the CN North-Beijing1 region from Table 1. The OBS path is obs://modelarts-cnnorth1-market-dataset/dataset-market/Yunbao-Data-Custom/unarchiver. Run the following command to check whether the public bucket is accessible:

    import moxing as mox
    mox.file.exists('obs://modelarts-cnnorth1-market-dataset/dataset-market/Yunbao-Data-Custom/unarchiver')

    If True is returned, the OBS bucket is normal.

  4. Check whether your OBS bucket can be accessed.

    For example, create an OBS bucket named test-modelarts and a folder named dataset-yunbao. Run the following command to check whether your bucket is accessible:

    import moxing as mox
    mox.file.exists('obs://test-modelarts/dataset-yunbao')

    If True is returned, the OBS bucket is normal.

  5. Check whether you have the write permission on the OBS bucket.

    For example, the path of the target OBS bucket is obs://test-modelarts/dataset-yunbao. Run the following command to check the permission:

    import moxing as mox
    mox.file.write('obs://test-modelarts/dataset-yunbao/obs_file.txt', 'Hello, OBS Bucket!')
    mox.file.remove('obs://test-modelarts/dataset-yunbao/obs_file.txt', recursive=False)
  6. Run the following command to copy the sample dataset from the public bucket to your OBS bucket:
    import moxing as mox
    mox.file.copy_parallel('obs://modelarts-cnnorth1-market-dataset/dataset-market/Yunbao-Data-Custom/unarchiver', 'obs://test-modelarts/dataset-yunbao
    ')
    print ('Copy procedure is completed')

    When Copy procedure is completed and the execution time are returned, the dataset is copied. Information similar to the following is displayed:

    Copy procedure is completed
    CPU times: user 117 ms, sys: 92.3 ms, total: 209 ms
    Wall time: 58.3 s

Method 3: Use the obsutill Tool of OBS to Copy Files

The sample dataset must be in the same region as your OBS bucket. You can use the obsutil tool provided by OBS to copy the sample dataset. You are advised to obtain the OBS path (in OBS format) of the decompressed dataset file in Table 1 and copy the file to your OBS bucket by running the object copy command in Copying an Object.

For details about how to use obsutil, see obsutil in the Object Storage Service Tools Guide.