Using MXNet for Caltech Image Recognition
This section describes how to use the deep learning framework MXNet to train the Caltech dataset on the ModelArts platform and publish the obtained model as an available inference service.
Complete the preparation. For details, see Preparations. Then use MXNet to recognize Caltech images as follows:
- Preparing Data: Obtain the Caltech101 dataset and upload it to OBS.
- Training a Model: Use the native MXNet API to compile the model training script and create a training job for model training.
- Deploying the Model: After obtaining the trained model file, create a prediction job to deploy the model as a real-time prediction service.
- Performing Prediction: Initiate a prediction request and obtain the prediction result.
If you understand the MXNet engine and want to perform more operations based on this example, see Advanced Use.
Preparations
- On OBS Console, create a bucket and folders for storing the dataset and sample code. Table 1 lists the bucket and folders to be created in this example.
- Download the ModelArts-Lab project from Gitee, obtain the sample code from the codes folder of the project, and upload the code to OBS. For details about the files and the corresponding OBS paths, see Table 2.
Table 2 Obtaining sample files and uploading the files to OBS File
OBS Path
- train_caltech.py
- libimageaugdefault.so
- symbol directory and all files in it
test-modelarts/codes
- customize_service.py
- config.json
test-modelarts/caltech-log/model
NOTE:- Upload the files after the training is completed.
- If the training job runs multiple times, multiple versions are generated. That is, directories of multiple versions, such as V0001 and V0002, are generated in the caltech-log directory. Upload the files to the model folder of the corresponding training job version.
Preparing Data
The Caltech101 dataset is an open source image dataset containing 101 classes (including a background class) provided by the California Institute of Technology. Each class has about 40 to 800 images, and the size of each image is close to 300 x 200 pixels. For details about the dataset, see Caltech 101.
ModelArts provides the Caltech101 dataset named Caltech101-data-set. Perform the following operations to obtain the dataset and upload it to an OBS folder, for example, test-modelarts/Caltech/data.
- Download the Caltech101-data-set dataset to the local PC.
- Decompress the Caltech101-data-set.zip file to the Caltech101-data-set directory on the local PC.
- Upload all files in the Caltech101-data-set directory to the test-modelarts/Caltech/data directory on OBS. For details about how to upload files, see Uploading a File.
To facilitate preprocessing, the rec file used by MXNet has been created. In the field of deep learning, before training, datasets are generally divided into training sets, validation sets, and test sets in a proportion of 6:2:2. A training set is used during training. A validation set is used for model evaluation during training. A test set is used for model evaluation after training. In this example, model evaluation is not required. Therefore, no test set is used, and the dataset is divided into a training set and a validation set in a proportion of 8:2. train indicates the training set, and val indicates the validation set. The dataset contains the lst and idx files. The lst file lists the image paths, mapping to the images in the train and val datasets. The idx file can facilitate shuffling. However, developers do not need this file during model training, and can ignore it in the entire example operation process.
Training a Model
After the data is prepared, use the MXNet API to compile the training script code. ModelArts provides a code sample, train_caltech.py. The following operations use this sample to train the model.
In this example, Deep Convolutional Neural Network (DCNN) ResNet is used for training, and possible convolutional layers of ResNet are 18, 34, and 50. More model layers mean a deeper model, longer training time, and higher model accuracy. ResNet is one of the commonly used convolutional neural networks for image classification.
- Upload the required files in the codes directory to OBS, for example, to the test-modelarts/Caltech/code directory. For details, see Preparations.
- The files required for training the model in the codes directory are as follows: train_caltech.py and all files in the libimageaugdefault.so and symbol directories.
- The folder for storing training scripts in OBS must be named codes. Otherwise, the training job will fail.
- On the ModelArts management console, choose Training Management > Training Jobs, and click Create in the upper left corner.
- On the Create Training Job page, set required parameters based on Figure 1 and Figure 2, and click Next.
Algorithm Source: Set the value to the path of the train_caltech.py sample script.
Data Source: Set the value to the path for storing the Caltech101 sample data.
Running Parameter: Add the max_epoches=10 parameter. The entire dataset is trained in 1 epoch. If you set max_epoches to 10, 10 epochs of training are performed. The value can be changed. If this parameter is not set, the default value 100 is used. The training duration prolongs as the value of max_epoches increases. For more information about running parameters, see Table 3.
- Select MXNet and MXNet-1.2.1-python3.6 for AI Engine. Since the engine version has been written in the sample code, the training job will fail if MXNet-1.2.1-python2.7 is used.
- It is recommended that GPU resources be selected because the data volume of this training job is large and the operation takes long time.
- On the Confirm tab page, check the parameters of the training job and click Submit.
- On the Training Jobs page, when the training job status changes to Running Success, the model training is completed. If any exception occurs, click the job name to go to the job details page and view the training job logs.
The training job may take more than 10 minutes to complete. If the training time exceeds a certain period (for example, one hour), manually stop it to release resources. Otherwise, the account balance may be insufficient, especially for the models that are trained using GPUs.
- (Optional) During or after model training, you can create a visualization job to view parameter statistics. For details, see Creating a Visualization Job.
In Training Output Path, select the value of Training Output Path specified for the training job. Complete visualization job creation as prompted.
|
Parameter |
Description |
|---|---|
|
num_epochs |
Number of training epochs. The default value is 30. |
|
batch_size |
Size of samples contained in each epoch of the training. The default value is 128. |
|
lr |
Learning rate. The default value is 0.1. |
|
lr_step |
Epoch at which the learning rate decreases. The default value is 16,24,27. That is, 16 indicates that the learning rate decreases to 0.1 times of the original, that is, 0.01, at the end of the sixteenth epoch. The learning rate decreases in the same way for 24 and 27. |
|
num_layers |
Number of convolutional layers of the ResNet model. Possible values are 18, 34, and 50. The default value is 34. |
|
disp_batches |
Number of steps after which an output is performed. The default value is 20. |
Deploying the Model
After the model training is completed, create a prediction job and deploy the model as a real-time prediction service. Before deploying the model, obtain the customize_service.py and config.json configuration files and upload them to OBS. This inference code and configuration files are sample files provided by ModelArts. You can also develop your own inference code and configuration files based on Model Package Specifications.
- Before deploying the model, upload the inference code and configuration files to the corresponding OBS path. For details, see Preparations. In this example, the OBS path is /test-modelarts/Caltech/output/.
- On the ModelArts management console, choose Model Management > Models in the left navigation pane. On the displayed Models page, click Import.
- On the Import Model page, set required parameters as shown in Figure 3 and click Next.
In the Meta Model Source area, select OBS. Set Meta Model to the path specified by Training Output Path in the training job.
On the Models page, if the model status changes to Normal, the model has been imported successfully.
- Click the triangle next to a model name to expend all versions of the model. In the row of a version, choose Deploy > Real-Time Services in the Operation column to deploy the model as a real-time service.
On the Deploy page, the system automatically selects the model imported in the previous step and creates a real-time service as prompted.Figure 4 Deploying the model as a real-time service
Performing Prediction
After the model is deployed, wait until the service deployment is completed. If the service status changes to Running, the service has been deployed successfully.
- On the Real-Time Services page, click the name of the real-time service. The real-time service details page is displayed.
- Choose Prediction. Click Upload in the Image File text box to upload an image for prediction and click Predict.
predicted_label: image class determined by the model. The output is laptop, which indicates that the model prediction is correct.
scores: confidence score, that is, the predicted probability that the image belongs to a certain class. The top 5 items with the highest probability are outputted based on the last layer of the network plus the Softmax layer.
Figure 5 Image prediction
- If you do not need to use this model and real-time service any more, clear related resources to avoid unnecessary fees.
- On the Real-Time Services page, choose More > Stop or Delete to stop or delete the created real-time service.
- Go to OBS and delete the OBS bucket, folders, and files in this example.
- To delete data, access OBS, delete the uploaded data, and delete the folder and OBS bucket.
Advanced Use
- Model optimization
Not all images can be predicted correctly. For example, if all default parameters are used for training, the accuracy on the validation set in only about 78%. If you are not satisfied with the result or try to make the model effect better, you can adjust the following parameters: num_epochs, batch_size, lr, lr_step, and num_layers. For details about the parameters, see Table 3. In the training logs, the performance of the current model on the validation set is outputted after each epoch ends, as shown in Figure 6. You can observe these changes to understand the impact on the training model after the preceding parameters are modified.
Table 4 Parameter description Parameter
Description
Validation-accuracy
Proportion of the predicted classes with the highest confidence score being the correct classes, that is, the proportion of predicted_label being correct to the total predicted results during the preceding inference test. A larger value means a better model. In the field of deep learning, this mode is called top-1.
Validation-cross-entropy
Cross entropy, used to judge the loss of prediction accuracy for each class. The smaller the loss, the better.
Validation-top_k_accuracy_5
If one of the top 5 predicted classes with the highest confidence score is the correct class, the prediction is correct. In deep learning, this mode is generally called top-5. A larger value means a better model.
The value change does not mean an intuitive improvement on model effect. More test images are recommended. Some of the images can be predicted correctly using the existing model, but some cannot. For example, as shown in Figure 7, the image has the second highest confidence score is the correct image. That is, the prediction is correct in top-5 mode. Tuning the parameters can improve the prediction result, that is, making the image with the highest confidence score being the correct image. This can also improve the model capabilities. If you know how to use MXNet, you can modify the code or create your own rec file for training. ModelArts provides the DevEnviron function for users to modify code on the cloud. For details, see Modifying a model using DevEnviron.
- Modifying a model using DevEnviron
The following operations are intended for users who want to use the DevEnviron function to modify code or files on the cloud.
- On the ModelArts management console, choose DevEnviron > Notebook in the navigation pane on the left. On the Notebook page, click Create to create a notebook instance.
- On the Create Notebook page that is displayed, set parameters as prompted. See Figure 8. Complete the notebook instance creation as prompted.
- When the notebook instance status changes to Running, click Open in the Operation column. The Jupyter page is displayed. On the Jupyter page, you can open the corresponding file, modify the file, and save the changes. You can click New in the upper right corner to create a Python environment to debug code or create a Terminal (in the Linux-based cloud development environment) to debug code.
If a notebook instance is not required any more, stop it on the Notebook page in a timely manner to avoid unnecessary fees.
Last Article: Using Spark MLlib for Targeted Recommendations
Next Article: Using Spark MLlib for Vehicle Satisfaction Survey






Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.