Updated on 2022-03-13 GMT+08:00

Caffe Operator Boundaries

No.

Operator

Description

Boundary

1

Absval

Computes the absolute value of the input.

[Inputs]

One input

[Arguments]

engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Restrictions]

None

[Quantization tool supporting]

Yes

2

Argmax

Computes the index of the maximum values.

[Inputs]

One input

[Arguments]

  • out_max_val: (optional) bool, default to false
  • top_k: (optional) unit32, default to 1
  • axis: (optional) int32

[Restrictions]

None

[Quantization tool supporting]

No

3

BatchNorm

Normalizes the input:

variance of [(x – avg(x))/x]

[Inputs]

One input

[Arguments]

  • use_global_stats: bool, must be true
  • moving_average_fraction: (optional) float, default to 0.999
  • eps: (optional) float, default to 1e – 5

[Restrictions]

Only the C dimension can be normalized.

[Quantization tool supporting]

Yes

4

Concat

Concatenates the input along the given dimension.

[Inputs]

Multiple inputs

[Arguments]

  • concat_dim: (optional) uint32, default to 1, greater than 0
  • axis: (optional) int32, default to 1, exclusive with concat_dim. When axis is –1, four input dimensions are required. Otherwise, the result may be incorrect.

[Restrictions]

  • The number of dimensions of the input tensors must match, and all dimensions except axis must be equal.
  • The range of the input Tensor count is [1, 1000].

[Quantization tool supporting]

Yes

5

ConvolutionDepthwise

Convolution depthwise

[Inputs]

One input, with a constant filter and four dimensions

[Arguments]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • pad: uint32, default to 0, array
  • kernel_size: uint32, array
  • stride: uint32, default to 1, array
  • dilation: uint32, only dilation=1 is supported, array
  • pad_h: (optional) uint32, default to 0 (2D only)
  • pad_w: (optional) uint32, default to 0 (2D only)
  • kernel_h: (optional) uint32 (2D only)
  • kernel_w: (optional) uint32 (2D only)
  • stride_h: (optional) uint32 (2D only)
  • stride_w: (optional) uint32 (2D only)
  • group: (optional) uint32, default to 1
  • weight_filler: This parameter is not supported.
  • bias_filler: This parameter is not supported.
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • force_nd_im2col:(optional) bool, default to false
  • axis: (optional) int32, default to 1

[Restrictions]

filterN = inputC = group

(W + 15)/16 * 16) * filter.W * 32 ≤ 32 * 1024, where, W is W of the operator input and filter.W is W of the filter.

[Quantization tool supporting]

Yes

6

Convolution

Convolves the input.

[Inputs]

One input, with a constant filter and four dimensions

[Arguments]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • pad: uint32, default to 0, array
  • kernel_size: uint32, array
  • stride: uint32, default to 1, array
  • dilation: uint32, default to 1, array
  • pad_h: (optional) uint32, default to 0 (2D only)
  • pad_w: (optional) uint32, default to 0 (2D only)
  • kernel_h: (optional) uint32 (2D only)
  • kernel_w: (optional) uint32 (2D only)
  • stride_h: (optional) uint32 (2D only)
  • stride_w: (optional) uint32 (2D only)
  • group: (optional) uint32, default to 1
  • weight_filler: This parameter is not supported.
  • bias_filler: This parameter is not supported.
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • force_nd_im2col:(optional) bool, default to false
  • axis: (optional) int32, default to 1

[Restrictions]

  • (inputW + padWHead + padWTail) ≥ (((FilterW – 1) * dilationW) + 1)
  • (inputW + padWHead + padWTail)/StrideW + 1 ≤ 2147483647
  • (inputH + padHHead + padHTail) ≥ (((FilterH – 1) * dilationH) + 1)
  • (inputH + padHHead + padHTail)/StrideH + 1 ≤ 2147483647
  • 0 ≤ Pad < 256, 0 < FilterSize < 256, 0 < Stride < 64, 1 ≤ dilationsize < 256

[Quantization tool supporting]

Yes

7

Crop

Crops the input.

[Inputs]

Two inputs

[Arguments]

  • axis: (optional) int32, default to 2. When axis is –1, four input dimensions are required.
  • offset: uint32, array

[Restrictions]

None

[Quantization tool supporting]

No

8

Deconvolution

Deconvolution

[Inputs]

One input, with a constant filter and four dimensions

[Arguments]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • pad: uint32, default to 0, array
  • kernel_size: uint32, array
  • stride: uint32, default to 1, array
  • dilation: uint32, default to 1, array
  • pad_h: (optional) uint32, default to 0 (2D only)
  • pad_w: (optional) uint32, default to 0 (2D only)
  • kernel_h: (optional) uint32 (2D only)
  • kernel_w: (optional) uint32 (2D only)
  • stride_h: (optional) uint32 (2D only)
  • stride_w: (optional) uint32 (2D only)
  • group: (optional) uint32, default to 1
  • weight_filler: This parameter is not supported.
  • bias_filler: This parameter is not supported.
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • force_nd_im2col:(optional) bool, default to false
  • axis: (optional) int32, default to 1

[Restrictions]

  • group = 1
  • dilation = 1
  • filterH – padHHead – 1 ≥ 0
  • filterW – padWHead – 1 ≥ 0
  • Restrictions involving intermediate variables:

1. a = ALIGN(filter_num, 16) * ALIGN(filter_c, 16) * filter_h * filter_w * 2

If ALIGN(filter_c, 16)%32 = 0, a = a/2

2. conv_input_width = (deconvolution input W – 1) * strideW + 1

3. b = (conv_input_width) * filter_h * ALIGN(filter_num, 16) * 2 * 2

4. a + b ≤ 1024 * 1024

[Quantization tool supporting]

Yes

9

DetectionOutput

Generates detection results and outputs FSR.

[Inputs]

Three inputs

[Arguments]

  • num_classes: (mandatory) int32, indicating the number of classes to be predicted
  • share_location: (optional) bool, default to true, indicating that classes share one BBox
  • background_label_id: (optional) int32, default to 0
  • nms_param: (optional) indicating non-maximum suppression (NMS)
  • save_output_param: (optional) indicating whether to save the detection result
  • code_type: (optional) default to CENTER_SIZE
  • variance_encoded_in_target: (optional) bool, default to true. The value true indicates that the variance is encoded in the target, otherwise the prediction offset needs to be adjusted accordingly.
  • keep_top_k: (optional) int32, indicating the total number of BBoxes to be reserved for each image after NMS
  • confidence_threshold: (optional) float, indicating that only the detection whose confidence is above the threshold is considered. If this parameter is not set, all boxes are considered.
  • nms_threshold: (optional) float
  • top_k: (optional) int32
  • boxes: (optional) int32, default to 1
  • relative: (optional) bool, default to true
  • objectness_threshold: (optional) float, default to 0.5
  • class_threshold: (optional) float, default to 0.5
  • biases: array
  • general_nms_param: (optional)

[Restrictions]

  • Used for Faster R-CNN
  • Non-maximum suppression (NMS) ratio nmsThreshold is of range (0, 1).
  • Probability threshold postConfThreshold is of range (0, 1).
  • At least two classes
  • Input box count ≤ 1024
  • Output W dimension = 16

[Quantization tool supporting]

Yes

10

Eltwise

Compute element-wise operations (PROD, MAX, and SUM).

[Inputs]

At least two inputs

[Arguments]

  • operation: (optional) enum, (PROD = 0; SUM = 1; MAX = 2), default to SUM
  • coeff: array, float
  • stable_prod_grad: (optional) bool, default to true

[Restrictions]

  • Up to four inputs
  • Compared with the native operator, this operator does not support the stable_prod_grad parameter.
  • PROD, MAX, and SUM operations are supported.

[Quantization tool supporting]

Yes

11

Elu

Activation function

[Inputs]

One input

[Arguments]

alpha: (optional) float, default to 1

[Restrictions]

None

[Quantization tool supporting]

No

12

Exp

Applies e as the base and x as the exponent.

[Inputs]

One input

[Arguments]

  • base: (optional) float, default to –1.0
  • scale: (optional) float, default to 1.0
  • shift: (optional) float, default to 0.0

[Restrictions]

None

[Quantization tool supporting]

No

13

Flatten

Converts an input n * c * h * w into a vector n * (c * h * w).

[Inputs]

One input

top_size ≠ bottom_size ≠ 1

When axis is –1, four input dimensions are required.

[Arguments]

  • axis: (optional) int32, default to 1
  • end_axis: (optional) int32, default to –1

[Restrictions]

axis < end axis

[Quantization tool supporting]

Yes

14

FullConnection

Computes an inner product.

[Inputs]

One input

[Arguments]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • weight_filler: This parameter is not supported.
  • bias_filler: This parameter is not supported.
  • axis: (optional) int32, default to 1
  • transpose: (optional) bool, default to false

[Restrictions]

  • transpose = false, axis = 1
  • In the quantization scenario, Bais_C <= 59136; In non-quantified scenarios, Bais_C <= 118272
  • To quantify the model, the following dimension restrictions must be satisfied:
  • When N = 1: 2 * CEIL(C, 16) * 16 * xH * xW ≤ 1024 * 1024;
  • When N > 1: 2 * 16 * CEIL(C, 16) * 16 * xH * xW ≤ 1024 * 1024.

[Quantization tool supporting]

Yes

15

Interp

Interpolation layer

[Inputs]

One input

[Arguments]

  • height: (optional) int32, default to 0
  • width: (optional) int32, default to 0
  • zoom_factor: (optional) int32, default to 1
  • shrink_factor: (optional) int32, default to 1
  • pad_beg: (optional) int32, default to 0
  • pad_end: (optional) int32, default to 0

Note:

  • zoom_factor and shrink_factor are exclusive.
  • height and zoom_factor are exclusive.
  • height and shrink_factor are exclusive.

[Restrictions]

(outputH * outputW)/(inputH * inputW) > 1/30

[Quantization tool supporting]

No

16

Log

Performs logarithmic operation on the input.

[Inputs]

One input

[Arguments]

  • base: (optional) float, default to –1.0
  • scale: (optional) float, default to 1.0
  • shift: (optional) float, default to 0.0

[Restrictions]

None

[Quantization tool supporting]

No

17

LRN

Normalizes the input in a local region.

[Inputs]

One non-constant input

[Arguments]

  • local_size: (optional) uint32, default to 5
  • alpha: (optional) float, default to 1
  • beta: (optional) float, default to 0.75
  • norm_region: (optional) enum, default to ACROSS_CHANNELS (ACROSS_CHANNELS = 0, WITHIN_CHANNEL = 1)
  • lrnk: (optional) float, default to 1
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Restrictions]

  • local_size is an odd number greater than 0.
  • Inter-channel: If local_size is of range [1, 15]: lrnK > 0.00001 and beta > 0.01; Otherwise, lrnK and beta are any values. lrnK and alpha are not 0 at the same time. When the C dimension is greater than 1776, local_size < 1728.
  • Intra-channel: lrnK = 1, local_size is of range [1, 15], beta > 0.01

[Quantization tool supporting]

Yes

18

LSTM

Long and short term memory network (LSTM)

[Inputs]

Two or three inputs

  • X: time sequence data (T * B * Xt). According to the NCHW format of 4D, ensure that the following conditions are met: N is the time sequence length T, C is the batch number B, H is the input data Xt at the t moment, and W is 1.
  • Cont: sequence continuity flag (T * B)
  • Xs: (optional) static data (B * Xt)

[Arguments]

  • num_output: (optional) uint32, default to 0
  • weight_filler: This parameter is not supported.
  • bias_filler: This parameter is not supported.
  • debug_info: (optional) bool, default to false
  • expose_hidden: (optional) bool, default to false

[Restrictions]

  • Restrictions involving intermediate variables, ht and output are the argument num_output:

a = (ALIGN(xt,16) + ALIGN(output,16)) * 16 * 2 * 2

b = (ALIGN(xt,16) + ALIGN(output,16)) * 16 * 4 * 2 * 2

d = 16 * ALIGN(ht,16) * 2

e = B * 4

That is:

a + b ≤ 1024 * 1024

d ≤ 256 * 1024/8

e ≤ 256*1024/32

  • B<=16, T<=768

[Quantization tool supporting]

No

19

Normalize

Normalization layer

[Inputs]

One input

[Arguments]

  • across_spatial: (optional) bool, default to true
  • scale_filler: This parameter is not supported.
  • channel_shared: (optional) bool, default to true
  • eps: (optional) float, default to 1e – 10

[Restrictions]

  • 1e – 7 < eps ≤ 0.1 + (1e – 6)
  • across_spatial must be true for Caffe, indicating normalization by channel

[Quantization tool supporting]

Yes

20

Permute

Permutes the input dimensions according to a given mode.

[Inputs]

One input

[Arguments]

order: uint32, array

[Restrictions]

None

[Quantization tool supporting]

Yes

21

Pooling

Pools the input.

[Inputs]

One input

[Arguments]

  • pool: (optional) enum, indicating the pooling method, MAX = 0, AVE = 1, and STOCHASTIC = 2, default to MAX
  • pad: (optional) uint32, default to 0
  • pad_h: (optional) uint32, default to 0
  • pad_w: (optional) uint32, default to 0
  • kernel_size: (optional) uint32, exclusive with kernel_h/kernel_w
  • kernel_h: (optional) uint32
  • kernel_w: (optional) uint32, used in pair with kernel_h
  • stride: (optional) uint32, default to 1
  • stride_h: (optional) uint32
  • stride_w: (optional) uint32
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • global_pooling: (optional) bool, default to false
  • ceil_mode: (optional) bool, default to true
  • round_mode: (optional) enum, CEIL = 0, FLOOR = 1, default to CEIL

[Restrictions]

  • kernelH ≤ inputH + padTop + padBottom
  • kernelW ≤ inputW + padLeft + padRight
  • padTop < windowH
  • padBottom < windowH
  • padLeft < windowW
  • padRight < windowW
  • Only the global pooling mode is supported. The following restrictions must be satisfied:

1) outputH == 1 && outputW == 1 && kernelH ≥ inputH && kernelW ≥ inputW

2) inputH * inputW ≤ 10000

[Quantization tool supporting]

Yes

22

Power

y = (scale * x + shift)^power

[Inputs]

One input

[Arguments]

  • power: (optional) float, default to 1.0
  • scale: (optional) float, default to 1.0
  • shift: (optional) float, default to 0.0

[Restrictions]

  • power! = 1
  • scale*x + shift > 0

[Quantization tool supporting]

Yes

23

Prelu

Activation function

[Inputs]

One input

[Arguments]

  • filler: This parameter is not supported.
  • channel_shared: (optional) bool, indicating whether to share slope parameters across channels, default to false

[Restrictions]

None

[Quantization tool supporting]

Yes

24

PriorBox

Obtains the real location of the target from the box proposals.

[Inputs]

Two inputs:

  • Original input image of model, data format: NCHW;
  • FeatureMap, data format: NCHW.

[Arguments]

  • min_size: (mandatory) indicating the minimum frame size (in pixels)
  • max_size: (mandatory) indicating the maximum frame size (in pixels)
  • aspect_ratio: array, float. A repeated ratio is ignored. If no aspect ratio is provided, the default ratio 1 is used.
  • flip: (optional) bool, default to true. The value true indicates that each aspect ratio is reversed. For example, for aspect ratio r, the aspect ratio 1.0/r is generated.
  • clip: (optional) bool, default to false. The value true indicates that the previous value is clipped to the range [0, 1].
  • variance: array, used to adjust the variance of the BBoxes
  • img_size: (optional) uint32. exclusive with img_h/img_w
  • img_h: (optional) uint32
  • img_w: (optional) uint32
  • step: (optional) float. step_h and step_w are exclusive.
  • step_h: (optional) float
  • step_w: (optional) float
  • offset: (optional) float, default to 0.5

[Restrictions]

Used for the SSD network only

Output dimensions: [n, 2, detection frame * 4, 1]

[Quantization tool supporting]

Yes

25

Proposal

Sorts the box proposals by (proposal, score) and obtains the top N proposals by using the NMS.

[Inputs]

Three inputs: scores, bbox_pred, im_info

[Arguments]

  • feat_stride: (optional) float
  • base_size: (optional) float
  • min_size: (optional) float
  • ratio: array (optional), float
  • scale: array (optional), float
  • pre_nms_topn: (optional) int32
  • post_nms_topn: (optional) int32
  • nms_thresh: (optional) float

[Restrictions]

Used only for Faster R-CNN

  • ProposalParameter and PythonParameter are exclusive.

    1. Value range of preTopK: 1–6144

    2. Value range of postTopK: 1–1024

    3. scaleCnt * ratioCnt ≤ 64

    4. 0 < nmsTresh ≤ 1 (threshold for box filtering)

    5. minSize: minimum edge length of a proposal. A box with any side smaller than minSize is removed.

    6. featStride: H/W stride between the two adjacent boxes used in default box generation

    7. baseSize: base box size used in default box generation

    8. ratio and scale: used in default box generation

    9. imgH and imgW: height and width of the image input to the network. The values must be greater than 0.

  • Restrictions on the input dimensions:

    clsProb: C = 2 * scaleCnt * ratioCnt

    bboxPred: C = 4 * scaleCnt * ratioCnt

    bboxPrior: N = clsProb.N, C = 4 * scaleCnt * ratioCnt

    imInfo: N = clsProb.N, C = 3

[Quantization tool supporting]

Yes

26

PSROIPooling

Position-sensitive region-of-interest pooling (PSROIPooling)

[Inputs]

Two inputs

[Arguments]

  • spatial_scale: (mandatory) float
  • output_dim: (mandatory) int32, indicating the number of output channels
  • group_size: (mandatory) int32, indicating the number of groups to encode position-sensitive score maps

[Restrictions]

Used for the Region-based Fully Convolutional Network (R-FCN)

  • ROI coordinates [roiN, roiC, roiH, roiW]: 1 ≤ roiN ≤ 65535, roiC == 5, roiH == 1, roiW == 1
  • Dimensions of the input feature map: [xN, xC, xH, xW]
  • pooledH == pooledW == groupSize ≤ 128

pooledH and pooledW indicate the length and width of the pooled ROI.

Output format: y [yN, yC, yH, yW]

  • poolingMode == avg pooling, pooledH == pooledW == groupSize, pooledH ≤ 128, spatialScale > 0, groupSize > 0, outputDim > 0
  • 1 ≤ xN ≤ 65535, roisN % xN == 0
  • HW_LIMIT is the limit of xH and xW.

xHW = xH * xW

pooledHW = pooledH * pooledW

HW_LIMIT = (64 * 1024 – 8 * 1024)/32,

xH ≥ pooledH, xW ≥ pooledW

xHW ≥ pooledHW

xHW/pooledHW ≤ HW_LIMIT

  • In multi-batch scenarios, the ROIs are allocated equally to the batches. In addition, the batch sequence of the ROIs is the same as the feature.

[Quantization tool supporting]

Yes

27

Relu

Activation function, including common ReLU and Leaky ReLU, which can be specified by parameters

[Inputs]

One input

[Arguments]

  • negative_slope: (optional) float, default to 0
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Restrictions]

None

[Quantization tool supporting]

Yes

28

Reshape

Reshapes the input.

[Inputs]

One input

[Arguments]

  • shape: constant, int64 or int
  • axis: (optional) int32, default to 0
  • num_axes: (optional) int32, default to –1

[Restrictions]

None

[Quantization tool supporting]

Yes

29

ROIAlign

Aggregates features using ROIs.

[Inputs]

At least two inputs

[Arguments]

  • pooled_h: (optional) uint32, default to 0
  • pooled_w: (optional) uint32, default to 0
  • spatial_scale: (optional) float, default to 1
  • sampling_ratio: (optional) int32, default to –1

[Restrictions]

Mainly used for Mask R-CNN

Restrictions on the feature map:

  • H * W ≤ 5248 or W * C < 40960
  • C ≤ 1280
  • ((C – 1)/128+1) * pooledW ≤ 216

Restrictions on the ROI:

  • C = 5 (Caffe), H = 1, W = 1
  • samplingRatio * pooledW ≤ 128, samplingRatio * pooledH ≤ 128
  • H ≥ pooledH, W ≥ pooledW

[Quantization tool supporting]

Yes

30

ROIPooling

Maps ROI proposals to a feature map.

[Inputs]

At least two inputs

[Arguments]

  • pooled_h: (mandatory) uint32, default to 0
  • pooled_w: (mandatory) uint32, default to 0
  • spatial_scale: (mandatory) float, default to 1. The multiplication spatial scale factor is used to convert ROI coordinates from the input scale to the pool scale.

[Restrictions]

Mainly used for Faster R-CNN

  • Input dimensions: H * W ≤ 8160, H ≤ 120, W ≤ 120
  • Output dimensions: pooledH ≤ 20, pooledW ≤ 20

[Quantization tool supporting]

Yes

31

Scale

out = alpha*Input+beta

[Inputs]

Two inputs, each with four dimensions

[Arguments]

  • axis: (optional) int32, 1 (default) or –3
  • num_axes: (optional) int32, default to 1
  • filler: This parameter is not supported.
  • bias_term: (optional) bool, default to false, indicating whether to learn a bias (equivalent to ScaleLayer + BiasLayer, but may be more efficient).
  • bias_filler: This parameter is not supported.

[Restrictions]

shape of scale and bias: (n, c, 1, 1), with the C dimension equal to that of the input

[Quantization tool supporting]

Yes

32

ShuffleChannel

Shuffles information cross the feature channels.

[Inputs]

One input

[Arguments]

group: (optional) uint32, default to 1

[Restrictions]

None

[Quantization tool supporting]

Yes

33

Sigmoid

Activation function

[Inputs]

One input

[Arguments]

engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Restrictions]

None

[Quantization tool supporting]

Yes

34

Slice

Slices an input into multiple outputs.

[Inputs]

One input

[Arguments]

  • slice_dim: (optional) uint32, default to 1, exclusive with axis
  • slice_point: array, uint32
  • axis: (optional) int32, default to 1, indicating concatenation along the channel dimension

[Restrictions]

None

[Returns]

No restrictions

[Quantization tool supporting]

Yes

35

Softmax

Normalization logic function

[Inputs]

One input

[Arguments]

  • engine: (optional) default to 0, CAFFE = 1, CUDNN = 2
  • axis: (optional) int32, default to 1, indicating the axis along which softmax is performed

[Restrictions]

If the input contains four dimensions, softmax is performed on each of them.

According to axis:

  • When axis = 1: C ≤ ((256 * 1024/4) – 8 * 1024 – 256)/2
  • When axis = 0: N ≤ (56 * 1024 – 256)/2
  • When axis = 2: W = 1, 0 < H < (1024 * 1024/32)
  • When axis = 3: 0 < W < (1024 * 1024/32)

If the input contains fewer than four dimensions, softmax is performed only on the last dimension, with the last dimension ≤ 46080.

[Quantization tool supporting]

Yes

36

Tanh

Activation function

[Inputs]

One input

[Arguments]

engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Restrictions]

The number of tensor elements cannot exceed INT32_MAX.

[Quantization tool supporting]

Yes

37

Upsample

Backward propagation of max pooling

[Inputs]

Two inputs

[Arguments]

scale: (optional) int32, default to 1

[Restrictions]

None

[Quantization tool supporting]

Yes

38

SSDDetectionOutput

SSD network detection output

[Inputs]

Three inputs

[Arguments]

  • num_classes: (mandatory) int32, indicating the number of classes to be predicted
  • share_location: (optional) bool, default to true, indicating that classes share one BBox
  • background_label_id: (optional) int32, default to 0
  • nms_param: (optional) indicating non-maximum suppression (NMS)
  • save_output_param: (optional) indicating whether to save the detection result
  • code_type: (optional) default to CENTER_SIZE
  • variance_encoded_in_target: (optional) bool, default to true. The value true indicates that the variance is encoded in the target, otherwise the prediction offset needs to be adjusted accordingly.
  • keep_top_k: (optional) int32, indicating the total number of BBoxes to be reserved for each image after NMS
  • confidence_threshold: (optional) float, indicating that only the detection whose confidence is above the threshold is considered. If this parameter is not set, all boxes are considered.
  • nms_threshold: (optional) float
  • top_k: (optional) int32
  • boxes: (optional) int32, default to 1
  • relative: (optional) bool, default to true
  • objectness_threshold: (optional) float, default to 0.5
  • class_threshold: (optional) float, default to 0.5
  • biases: array
  • general_nms_param: (optional)

[Restrictions]

  • Used for the SSD network
  • Value range of preTopK and postTopK: 1–1024
  • shareLocation = true
  • nmsEta = 1
  • Value range of numClasses: 1–2048
  • code_type = CENTER_SIZE
  • Value range of nms_threshold and confidence_threshold: 0.0–1.0

[Quantization tool supporting]

Yes

39

Reorg

Real-time object detection

[Inputs]

One input

[Arguments]

  • stride: (optional) uint32, default to 2
  • reverse: (optional) bool, default to false

[Restrictions]

Used only for YOLOv2

[Quantization tool supporting]

No

40

Reverse

Reversion

[Inputs]

One input

[Arguments]

axis: (optional) int32, default to 1. Controls the axis to be reversed. The content layout will not be reversed.

[Restrictions]

None

[Quantization tool supporting]

No

41

LeakyRelu

LeakyRelu activation function

[Inputs]

One input

[Arguments]

Same as Relu

[Restrictions]

None

[Quantization tool supporting]

Yes

42

YOLODetectionOutput

YOLO network detection output

[Inputs]

Four inputs

[Arguments]

  • num_classes: (mandatory) int32, indicating the number of classes to be predicted
  • share_location: (optional) bool, default to true, indicating that classes share one BBox
  • background_label_id: (optional) int32, default to 0
  • nms_param: (optional) indicating non-maximum suppression (NMS)
  • save_output_param: (optional) indicating whether to save the detection result
  • code_type: (optional) default to CENTER_SIZE
  • variance_encoded_in_target: (optional) bool, default to true. The value true indicates that the variance is encoded in the target, otherwise the prediction offset needs to be adjusted accordingly.
  • keep_top_k: (optional) int32, indicating the total number of BBoxes to be reserved for each image after NMS
  • confidence_threshold: (optional) float, indicating that only the detection whose confidence is above the threshold is considered. If this parameter is not set, all boxes are considered.
  • nms_threshold: (optional) float
  • top_k: (optional) int32
  • boxes: (optional) int32, default to 1
  • relative: (optional) bool, default to true
  • objectness_threshold: (optional) float, default to 0.5
  • class_threshold: (optional) float, default to 0.5
  • biases: array
  • general_nms_param: (optional)

[Restrictions]

  • Used only for YOLOv2
  • classNUm < 10240, anchorBox < 5
  • W ≤ 1536
  • The upper layer of yolodetectionoutput must be the yoloregion operator.

[Quantization tool supporting]

No