Common Methods of Optimizing Model Precision in Model Optimization

Overview

In deep learning competitions, many tricks are emerging. One of the controversial methods is to use augmentation during tests to generate multiple copies of the input source images, send them to the models, and combine all inference results. This method is called test time augmentation (TTA). This section describes the TTA principles and suggestions.

Principles

  • TTA process

    The basic TTA process is as follows: Augment the original images to obtain multiple augmented samples and form a data group with the original images. Use these samples to obtain inference results. Combine the inference results using a certain method to obtain the final inference result and then calculate the precision.

    Figure 1 TTA process

    The following problems need to be confirmed:

    1. What augmentation method is used to generate samples for the original images?
    2. What method is used to integrate the inference results obtained using the samples?

    The following describes the functions of the TTA and how to use the TTA by using the functions provided by the ModelArts platform.

  • Example of using the TTA
    • Dataset: The following figure shows a dataset example. The left part consists of 754 normal images. The right part consists of 358 abnormal electrical board images. After some augmentation measures are taken, the number of normal images increases to 1508, and the number of abnormal images decreases to 1432.
      Figure 2 Dataset example
    • Framework and algorithm: See ImageNet open source code.
    • Training policy: 50 epochs, initial learning rate lr0.001, batchsize16 trained using Adam's optimizer
    Model precision

    Precision

    Normal

    Abnormal

    Recall

    97.2%

    71.3%

    Accuracy

    89.13%

  • TTA process
    1. Select an augmentation method to obtain the samples. You can select the method as follows:
      1. Select the augmentation method used in training.

        For example, in the ImageNet training code provided by PyTorch, the operator transforms.RandomHorizontalFlip() is used for vertical flipping. For the model, there are many images that are vertically flipped. Therefore, you can use vertical flipping as an augmentation method.

      2. Evaluate the model and analyze the augmentation method to be used based on the model evaluation result.

        Evaluate the original model. The evaluation code is as follows (the evaluation code is obtained by modifying the code for performing forward inference in the validation section of the open source code):

        with torch.no_grad():
        end = time.time()
        for i, (images, target) in enumerate(val_loader):
        if args.gpu is not None:
        images = images.cuda(args.gpu, non_blocking=True)
        target = target.cuda(args.gpu, non_blocking=True)
        
        # compute output
        output_origin = model(images)
        output = output_origin
        loss = criterion(output, target)
        pred_list += output.cpu().numpy()[:, :2].tolist()
        target_list += target.cpu().numpy().tolist()
        # measure accuracy and record loss
        acc1, acc5 = accuracy(output, target, topk=(1, 5), i=i)
        losses.update(loss.item(), images.size(0))
        top1.update(acc1[0], images.size(0))
        top5.update(acc5[0], images.size(0))
        
        # measure elapsed time
        batch_time.update(time.time() - end)
        end = time.time()
        
        if i % args.print_freq == 0:
        progress.display(i)
        # TODO: this should also be done with the ProgressMeter
        print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
        .format(top1=top1, top5=top5))
        name_list = val_loader.dataset.samples
        for idx in range(len(name_list)):
        name_list[idx] = name_list[idx][0]
        analyse(task_type='image_classification', save_path='./', pred_list=pred_list, label_list=target_list, name_list=name_list)

        The evaluation requires three lists. The logits results are combined into the pred_list, which stores the prediction result of each image, for example, [[8.725419998168945, 21.92235565185547]...[xxx, xxx]]. The target_list consists of the labels of each image, for example, [0, 1, 0, 1, 1..., 1, 0]. The name_list consists of the paths for storing original image files, for example, [xxx.jpg, ... xxx.jpg]. The analyse interface in the deep_moxing library is called to generate a model_analysis_results.json file in save_path. The file is uploaded to the output directory of any training task on the page. Then the model evaluation result is displayed on the evaluation page.

        Figure 3 Viewing the evaluation result

        The model sensitivity needs to be analyzed.

        Table 1 Analysis on the sensitivity of a model to image clarity

        Feature Distribution

        0

        1

        0% - 20%

        0.7929

        0.8727

        20% - 40%

        0.8816

        0.7429

        40% - 60%

        0.9363

        0.7229

        60% - 80%

        0.9462

        0.7912

        80% - 100%

        0.9751

        0.7619

        Standard deviation

        0.0643

        0.0523

        As shown in the preceding figure, the F1 score of class 0 (normal class) increases with the image clarity. That is, the model performs better in detecting the normal class on clear images. The precision for detecting class 1 (abnormal class) decreases as the image clarity increases. Blurred images can make the model more accurate to detect abnormal classes. Because this model focuses on the identification of abnormal classes, image blurring can be used as a TTA method.

    2. Add the TTA to PyTorch.

      The advantage of PyTorch is that you can directly obtain the tensor before model input and perform the required operation, for example, verification.

      1
      2
      3
      4
      5
      with torch.no_grad():         
          end = time.time()         
          for i, (images, target) in enumerate(val_loader):             
              if args.gpu is not None:                 
              images = images.cuda(args.gpu, non_blocking=True)
      

      The images obtained here are the image data of a batch that has been pre-processed. Two augmentation methods are determined in 1: vertical flipping and blurring.

      If the version is later than 0.4.0, you can use the following code for flipping in PyTorch:

      1
      2
      3
      4
      def flip(x, dim):     
          indices = [slice(None)] * x.dim()     
          indices[dim] = torch.arange(x.size(dim) - 1, -1, -1, dtype=torch.long, device=x.device)    
          return x[tuple(indices)]
      

      dim indicates the mode. In this example, 2 indicates vertical flipping, 3 indicates horizontal flipping, and 1 indicates channel rotation. Use img_flip = flip(images, 2) to obtain the images that are flipped vertically.

      You can use the blur operations provided by CV2 to perform blur operations.

      1
      2
      3
      img = images.numpy() 
      img[0] = cv2.blur(img[0], (3, 3)) 
      images_blur = torch.from_numpy(img.copy())
      
    3. Combine the results.

      Three outputs are obtained: origin_result (inference results of the original images), flip_output (results obtained after vertical flipping), and blur_output (results obtained after blurring).

      How are they combined?

      For flip_output, what is the proportion of flipped images in the original training? What is the contribution weight of a flipped image to the result in the final output? As many students with deep learning experience know, the flipping probability is 0.5. That is, the proportion of flipped images is about 0.5. The final contribution of the flipped image is 0.5. The following formula can be obtained:

      logits = 0.5 x origin_result + 0.5 x flip_result

      In this case, the precision of the model is as follows:

      Table 2 Model precision result

      Operation

      ACC

      Recall of Normal Class

      Recall of Abnormal Class

      Originals

      89.13%

      97.2%

      71.3%

      Flipping result combining

      87.74%

      93.7%

      72.7%

      Although the accuracy of the normal class decreases, the recall of the abnormal class increases.

      For blur_output, the precision of the abnormal class is the highest when the value is between 0 and 20%. However, the precision of the normal class drops. In addition, blurring is used to improve the precision of the abnormal class. Therefore, assume that the contribution of the blurred image is 0.5, the formula is as follows:

      logits = 0.5 x origin_result + 0.5 x blur_output

      In this case, the precision of the model is as follows:

      Table 3 Model precision result

      Operation

      Accuracy

      Recall of Normal Class

      Recall of Abnormal Class

      Originals

      89.13%

      97.2%

      71.3%

      Blurring result combining

      88.117%

      94.8%

      73.3%

      In the preceding table, the accuracy of the normal class decreases greatly, and that of the abnormal class increases significantly, which is consistent with the analysis result of model evaluation.

      In conclusion, the adjustment causes a large loss to the normal class and the overall precision decreases. However, this is consistent with the model analysis result. The adjustment aims to improve the recall of the abnormal class. The model evaluation result is slightly better than the result by augmenting the original images.

Summary

In this test, two TTA methods are used. One is to use the built-in augmentation method. The other one is to analyze model sensitivity and determine the image feature interval that is the most helpful for model inference. Of course, the TTA will increase the model inference time. For AI algorithms that have demanding requirements on the inference latency, you need to carefully select a proper solution.