Operator Usage Suggestions

Principles

On the Ascend 310 chip, you can improve the efficiency of the Cube to optimize the algorithm performance. In this case, you need to reduce data transmission and vector calculation. The general principles are as follows:

Network structure
- Mainstream network topologies such as ResNet and MobileNet are recommended because their performance has been optimized.
- Earlier network topologies such as VGG and AlexNet use large network models and bring high bandwidth pressure.
- In matrix multiplication, set the values of M, K, N to multiples of 16. Increase the number of channels in the algorithm if possible and do not reduce the number of channels by creating groups.
- Increasing the parameter reuse rate can reduce the bandwidth pressure. Therefore, you can increase the filter reuse rate to improve the performance. For example, use greater feature map sizes, smaller stride values, and smaller dilation values.
Conv operator
- In non-quantization mode, it is recommended that the number of input and output channels of a Conv operator be integral multiples of 16.
- In quantization mode, it is recommended that the number of input and output channels of a Conv operator be integral multiples of 32.
- In quantization mode, it is recommended that fewer pooling operators be inserted between multiple Conv operators.
Full connection (FC) operator
If FC operators exist on the network, use multiple batches to perform inference at the same time.
Concat operator
- In non-quantization mode, it is recommended that the number of input channels of the Concat operator be integral multiples of 16.
- In quantization mode, it is recommended that the input channel of the Concat operator be integral multiples of 32.
Conv fusion operator
The Conv+BatchNorm+Scale+Relu/Relu6 combination is recommended. The performance has been optimized.
Norm operator
- The BatchNorm operator is recommended, which uses the pre-trained Norm parameter.
- Operators (such as LRN) that require the online calculation of the Norm parameter are not recommended.
Detection operator
Mainstream detection network topologies such as Faster R-CNN and SSD are recommended because their performance has been optimized.

Tips

The performance of Conv+(BatchNorm+Scale)+Relu is better than that of Conv+(BatchNorm+Scale)+Tanh. Avoid using complex activation functions.
When Concat operators are assembled in the C dimension, the performance is better if the values of Tensor and Channel are multiples of 16.
When the value of Batch is a multiple of 16, the performance is better.
The continuous convolution structure shows better performance. If multiple vector operators (such as pooling) are inserted between the convolution layers, the performance is poor. This is obvious in the INT8 model.
In early versions of AlexNet and GoogleNet, LRN is used as the normalization operator. The calculation of this operator is complex. During the evolution of the algorithm, the operator is replaced with other operators such as BatchNorm. The LRN operator is no longer used in mainstream network structures such as ResNet and Inception. For the Ascend310 platform, it is recommended BatchNorm be used on the network.