Fine-grained Classification Optimization Using Center Loss

Symptom

Fine-grained classification refers to classification tasks where the classes are very similar, such as the types of birds and vehicle styles. In these images, each class has only slight differences. The following figure shows the data of 21 types of sparrows. Each column represents a type, and each type has ten images. These sparrows are highly similar. The features extracted from these similar images using a convolutional neural network (CNN) are usually very similar. In this case, it is difficult to distinguish these similar features using the Softmax Cross-Entropy Loss function.

Figure 1 Fine-grained classification

Solution

Center loss

To increase the differentiation between these features, Center loss is proposed in A Discriminative Feature Learning Approach for Deep Face Recognition. The principle of this loss function is to set several center points, so that features of different classes are as close as possible to their respective center points. That is, it is expected that the distance within a class decreases and the distance between classes increases. The formula is as follows, where x indicates a feature, c indicates a center, and c is updated with model training.

There are various ways to implement center loss. For details about Pytorch implementation of center loss, visit https://github.com/KaiyangZhou/pytorch-center-loss.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class CenterLoss(nn.Module):
    """Center loss.

    Reference:
    Wen et al. A Discriminative Feature Learning Approach for Deep Face Recognition. ECCV 2016.

    Args:
        num_classes (int): number of classes.
        feat_dim (int): feature dimension.
    """
    def __init__(self, num_classes=10, feat_dim=2, use_gpu=True):
        super(CenterLoss, self).__init__()
        self.num_classes = num_classes
        self.feat_dim = feat_dim
        self.use_gpu = use_gpu

        if self.use_gpu:
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim).cuda())
        else:
            self.centers = nn.Parameter(torch.randn(self.num_classes, self.feat_dim))

    def forward(self, x, labels):
        """
        Args:
            x: feature matrix with shape (batch_size, feat_dim).
            labels: ground truth labels with shape (batch_size).
        """
        batch_size = x.size(0)
        distmat = torch.pow(x, 2).sum(dim=1, keepdim=True).expand(batch_size, self.num_classes) + \
                  torch.pow(self.centers, 2).sum(dim=1, keepdim=True).expand(self.num_classes, batch_size).t()
        distmat.addmm_(1, -2, x, self.centers.t())

        classes = torch.arange(self.num_classes).long()
        if self.use_gpu:
            classes = classes.cuda()
        labels = labels.unsqueeze(1).expand(batch_size, self.num_classes)
        mask = labels.eq(classes.expand(batch_size, self.num_classes))

        dist = distmat * mask.float()
        loss = dist.clamp(min=1e-12, max=1e+12).sum() / batch_size

        return loss

In the preceding project, the MNIST dataset is used for the test. The following figure shows the feature dimension reduction. The first graph in the figure shows the common Softmax, and the second graph shows the Softmax plus center loss. You can clearly observe that the features are more distinguished.

Figure 2 Feature dimension reduction

Verification

Center loss was initially used in facial recognition tasks. Most open source projects reproduce the scenario on the MNIST dataset. The following describes how center loss works in a natural scenario.

The experiment data comes from the CUB-200 dataset. 21 types of sparrows are selected for experiment. There are 629 images in the training dataset and 615 images in the test dataset.

The classification network samples ResNet-50 and main.py provided by PyTorch are used for the test. The pre-trained ImageNet model is used, the learning rate is set to 0.001, batch-size is set to 32, and 30 epochs are trained.

The classification precision of 73.6% is obtained when only the conventional loss function is used. The following figure shows the visual feature dimension reduction of the last convolutional layer.

Figure 3 Visual feature dimension reduction

The features and labels of the last convolutional layer are input to center loss to calculate the loss value. The loss value and the normal Softmax loss function are weighted (weight: 0.001).

1
loss = softmax_loss(pred_output, target_label) + 0.001 * center_loss(feature, target_label)

In this case, the classification precision is 75.7%. The following figure shows the visual feature dimension reduction of the last layer. According to the classes circled in the figure, the intra-class distance decreases and the inter-class distance increases. However, the effect is not as obvious as that in the test using the MNIST dataset because the background is too complex.

Figure 4 Visual feature dimension reduction of the last layer

Suggestions

Center loss can optimize fine-grained classification in natural scenarios. However, the optimization effect will decrease as the background complexity increases.

Convolutional feature visualization can help effectively analyze model performance. This function will be rolled out in the model evaluation module of ModelArts.