Analysis on the Sensitivity of Object Detection Models to Bounding Box Brightness and Solution
Symptom
In an object detection task, the bounding box brightness of different datasets may be different, which can be reflected by bounding box brightness sensitivity. Bounding box brightness affects model training and inference. The following figure shows brightness values of the labels in the bounding boxes in an image.
Solution
Dropout is a widely used regularization technology in deep learning tasks. However, this technology randomly drops a unit at the feature layer. As a result, semantic information shared by adjacent feature units is also dropped. DropBlock solves the preceding problem. It drops features in a feature block (a contiguous region of a feature map), and performs regularization on the deep learning network. DropBlock is a simple method similar to Dropout. The main difference between the two is that DropBlock drops the adjacent regions of a feature map instead of a separate random unit. For details, see the DropBlock thesis.
DropBlock has two parameters: block_size and γ.
- block_size: block size (length and width). When block_size is 1, DropBlock degrades to the traditional Dropout. Available values are 3, 5, and 7.
- γ: probability during the drop process, that is, the Bernoulli probability.
The following shows the comparison between Dropout and DropBlock. In the figure, graph b indicates Dropout, and graph c indicates DropBlock.
Official implementation of TensorFlow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
class Dropblock(object):
"""DropBlock: a regularization method for convolutional neural networks.
DropBlock is a form of structured dropout, where units in a contiguous
region of a feature map are dropped together. DropBlock works better than
dropout on convolutional layers due to the fact that activation units in
convolutional layers are spatially correlated.
See https://arxiv.org/pdf/1810.12890.pdf for details.
"""
def __init__(self,
dropblock_keep_prob=None,
dropblock_size=None,
data_format='channels_last'):
self._dropblock_keep_prob = dropblock_keep_prob
self._dropblock_size = dropblock_size
self._data_format = data_format
def __call__(self, net, is_training=False):
"""Builds Dropblock layer.
Args:
net: `Tensor` input tensor.
is_training: `bool` if True, the model is in training mode.
Returns:
A version of input tensor with DropBlock applied.
"""
if (not is_training or self._dropblock_keep_prob is None or
self._dropblock_keep_prob == 1.0):
return net
logging.info('Applying DropBlock: dropblock_size %d,'
'net.shape %s', self._dropblock_size, net.shape)
if self._data_format == 'channels_last':
_, height, width, _ = net.get_shape().as_list()
else:
_, _, height, width = net.get_shape().as_list()
total_size = width * height
dropblock_size = min(self._dropblock_size, min(width, height))
# Seed_drop_rate is the gamma parameter of DropBlcok.
seed_drop_rate = (
1.0 - self._dropblock_keep_prob) * total_size / dropblock_size**2 / (
(width - self._dropblock_size + 1) *
(height - self._dropblock_size + 1))
# Forces the block to be inside the feature map.
w_i, h_i = tf.meshgrid(tf.range(width), tf.range(height))
valid_block = tf.logical_and(
tf.logical_and(w_i >= int(dropblock_size // 2),
w_i < width - (dropblock_size - 1) // 2),
tf.logical_and(h_i >= int(dropblock_size // 2),
h_i < width - (dropblock_size - 1) // 2))
if self._data_format == 'channels_last':
valid_block = tf.reshape(valid_block, [1, height, width, 1])
else:
valid_block = tf.reshape(valid_block, [1, 1, height, width])
randnoise = tf.random_uniform(net.shape, dtype=tf.float32)
valid_block = tf.cast(valid_block, dtype=tf.float32)
seed_keep_rate = tf.cast(1 - seed_drop_rate, dtype=tf.float32)
block_pattern = (1 - valid_block + seed_keep_rate + randnoise) >= 1
block_pattern = tf.cast(block_pattern, dtype=tf.float32)
if self._data_format == 'channels_last':
ksize = [1, self._dropblock_size, self._dropblock_size, 1]
else:
ksize = [1, 1, self._dropblock_size, self._dropblock_size]
block_pattern = -tf.nn.max_pool(
-block_pattern,
ksize=ksize,
strides=[1, 1, 1, 1],
padding='SAME',
data_format='NHWC' if self._data_format == 'channels_last' else 'NCHW')
percent_ones = tf.cast(tf.reduce_sum(block_pattern), tf.float32) / tf.cast(
tf.size(block_pattern), tf.float32)
net = net / tf.cast(percent_ones, net.dtype) * tf.cast(
block_pattern, net.dtype)
return net
|
Verification
The open source dataset Canine Coccidiosis Parasite is used for verification. The dataset has only one class. Before DropBlock is used, Table 1 describes the sensitivity of models to bounding box brightness before DropBlock is used.
|
Feature Distribution |
coccidia |
|---|---|
|
0% - 20% |
0.8065 |
|
20% - 40% |
0.871 |
|
40% - 60% |
0.9355 |
|
60% - 80% |
0.8065 |
|
80% - 100% |
0.9677 |
|
Standard deviation |
0.0658 |
Table 2 describes the sensitivity of models to bounding box brightness before DropBlock is used. After DropBlock is used, the brightness sensitivity of the bounding boxes reduces from 0.0658 to 0.0204. DropBlock can significantly reduces the bounding box brightness sensitivity of the object detection models in model evaluation.
Suggestions
In the model inference result, if the detected classes are very sensitive to the brightness of bounding boxes, you are advised to use DropBlock for model optimization and enhancement during training.
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.