Help Center/ Huawei HiLens/ User Guide/ Caffe Operator Boundaries
Updated on 2024-09-30 GMT+08:00

Caffe Operator Boundaries

For the Caffe framework, if the input dimension of each operator is not 4 and the axis parameter exists, negative numbers cannot be used.

Table 1 shows the boundaries of Caffe operators supported by .om models.

Table 1 Caffe operator boundaries

No.

Operator

Definition

Boundary

1

Absval

Computes the absolute value of the input.

[Input]

One input

[Parameter]

engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Constraint]

None

[Quantitative tool support]

Yes

2

Argmax

Returns the index number corresponding to the maximum input value.

[Input]

One input

[Parameter]

  • out_max_val: (optional) bool, default to false
  • top_k: (optional) unit32, default to 1
  • axis: (optional) int32

[Constraint]

None

[Quantitative tool support]

No

3

BatchNorm

Normalizes the input:

variance of [(x – avg(x))/x]

[Input]

One input

[Parameter]

  • use_global_stats: bool, must be true
  • moving_average_fraction: (optional) float, default to 0.999
  • eps: (optional) float, default to 1e-5

[Constraint]

Only the C dimension can be normalized.

[Quantitative tool support]

Yes

4

Concat

Concatenates the input along the given dimension.

[Input]

Multiple inputs

[Parameter]

  • concat_dim: (optional) uint32, default to 1, greater than 0
  • axis: (optional) int32, default to 1, exclusive with concat_dim. When axis is –1, four input dimensions are required. Otherwise, the result may be incorrect.

[Constraint]

  • For the input tensor, the sizes of its dimensions must be the same except the dimension for concatenation.
  • The range of the input tensor count is [1, 1,000].

[Quantitative tool support]

Yes

5

DepthwiseConvolution

Depthwise convolution

[Input]

One 4D input, with a constant filter

[Parameter]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • pad: uint32, default to 0, array
  • kernel_size: uint32, array
  • stride: uint32, default to 1, array
  • dilation: uint32, default to 1, array
  • pad_h: (optional) uint32, default to 0 (2D only)
  • pad_w: (optional) uint32, default to 0 (2D only)
  • kernel_h: (optional) uint32 (2D only)
  • kernel_w: (optional) uint32 (2D only)
  • stride_h: (optional) uint32 (2D only)
  • stride_w: (optional) uint32 (2D only)
  • group: (optional) uint32, default to 1
  • weight_filler: (optional) FillerParameter
  • bias_filler: (optional) FillerParameter
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • force_nd_im2col: (optional) bool, default to false
  • axis: (optional) int32, default to 1

[Constraint]

filterN=inputC=group

[Quantitative tool support]

Yes

6

Convolution

Convolution

[Input]

One 4D input, with a constant filter

[Parameter]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • pad: uint32, default to 0, array
  • kernel_size: uint32, array
  • stride: uint32, default to 1, array
  • dilation: uint32, default to 1, array
  • pad_h: (optional) uint32, default to 0 (2D only)
  • pad_w: (optional) uint32, default to 0 (2D only)
  • kernel_h: (optional) uint32 (2D only)
  • kernel_w: (optional) uint32 (2D only)
  • stride_h: (optional) uint32 (2D only)
  • stride_w: (optional) uint32 (2D only)
  • group: (optional) uint32, default to 1
  • weight_filler: (optional) FillerParameter
  • bias_filler: (optional) FillerParameter
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • force_nd_im2col: (optional) bool, default to false
  • axis: (optional) int32, default to 1

[Constraint]

  • (inputW + padWHead + padWTail) ≥ (((FilterW-1) x dilationW) + 1)
  • (inputW + padWHead + padWTail)/StrideW + 1 ≤ 2147483647
  • (inputH + padHHead + padHTail) ≥ (((FilterH-1) x dilationH) + 1)
  • (inputH + padHHead + padHTail)/StrideH + 1 ≤ 2147483647
  • 0 ≤ Pad < 256, 0 < FilterSize < 256, 0 < Stride < 64, 1 ≤ dilationsize < 256

[Quantitative tool support]

Yes

7

Crop

Crops the input.

[Input]

Two inputs

[Parameter]

  • axis: (optional) int32, default to 2. When axis is –1, four input dimensions are required.
  • offset: uint32, array

[Constraint]

None

[Quantitative tool support]

No

8

Deconvolution

Deconvolution

[Input]

One 4D input, with a constant filter

[Parameter]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • pad: uint32, default to 0, array
  • kernel_size: uint32, array
  • stride: uint32, default to 1, array
  • dilation: uint32, default to 1, array
  • pad_h: (optional) uint32, default to 0 (2D only)
  • pad_w: (optional) uint32, default to 0 (2D only)
  • kernel_h: (optional) uint32 (2D only)
  • kernel_w: (optional) uint32 (2D only)
  • stride_h: (optional) uint32 (2D only)
  • stride_w: (optional) uint32 (2D only)
  • group: (optional) uint32, default to 1
  • weight_filler: (optional) FillerParameter
  • bias_filler: (optional) FillerParameter
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • force_nd_im2col: (optional) bool, default to false
  • axis: (optional) int32, default to 1

[Constraint]

  • group = 1
  • dilation = 1
  • filterH - padHHead - 1 ≥ 0
  • filterW - padWHead - 1 ≥ 0

Restrictions involving intermediate variables:

  1. a = ALIGN(filter_num, 16) x ALIGN(filter_c, 16) x filter_h x filter_w x 2
  2. If ALIGN(filter_c, 16)%32 = 0, a = a/2
  3. conv_input_width = (deconvolution input W – 1) x strideW + 1
  4. b = (conv_input_width) x filter_h x ALIGN(filter_num, 16) x 2 x 2
  5. a + b ≤ 1024 x 1024

[Quantitative tool support]

Yes

9

DetectionOutput

Generates detection results and outputs FSR.

[Input]

Three inputs

[Parameter]

  • num_classes: (mandatory) int32, indicating the number of classes to be predicted
  • share_location: (optional) bool, default to true, indicating that classes share one bounding box
  • background_label_id: (optional) int32, default to 0
  • nms_param: (optional) indicating non-maximum suppression (NMS)
  • save_output_param: (optional) indicating whether to save the detection result
  • code_type: (optional) default to CENTER_SIZE
  • variance_encoded_in_target: (optional) bool, default to true. The value true indicates that the variance is encoded in the target, otherwise the prediction offset needs to be adjusted accordingly.
  • keep_top_k: (optional) int32, indicating the total number of BBoxes to be reserved for each image after NMS
  • confidence_threshold: (optional) float, indicating that only the detection whose confidence is above the threshold is considered. If this parameter is not set, all boxes are considered.
  • nms_threshold: (optional) float
  • top_k: (optional) int32
  • boxes: (optional) int32, default to 1
  • relative: (optional) bool, default to true
  • objectness_threshold: (optional) float, default to 0.5
  • class_threshold: (optional) float, default to 0.5
  • biases: array
  • general_nms_param: optional

[Constraint]

  • Used for Faster R-CNN
  • Non-maximum suppression (NMS) ratio nmsThreshold is within (0, 1)
  • Probability threshold postConfThreshold is within (0, 1)
  • Classes ≥ 2
  • Input box count ≤ 1,024
  • Output W dimension = 16

[Quantitative tool support]

Yes

10

Eltwise

Computes element-wise operations (PROD, MAX, and SUM).

[Input]

At least two inputs

[Parameter]

  • operation: (optional) enum, PROD = 0, SUM = 1, MAX = 2; default to SUM
  • coeff: float array
  • stable_prod_grad: (optional) bool, default to true

[Constraint]

  • Up to four inputs
  • Compared with the native operator, this operator does not support the stable_prod_grad parameter.
  • PROD, MAX, and SUM operations are supported.

[Quantitative tool support]

Yes

11

Elu

Activation function

[Input]

One input

[Parameter]

alpha: (optional) float, default to 1

[Constraint]

None

[Quantitative tool support]

No

12

Exp

Applies e as the base and x as the exponent.

[Input]

One input

[Parameter]

  • base: (optional) float, default to –1.0
  • scale: (optional) float, default to 1.0
  • shift: (optional) float, default to 0.0

[Constraint]

None

[Quantitative tool support]

No

13

Flatten

Converts an input of shape N * C * H * W to a vector output of shape N * (C * H * W).

[Input]

One input

(top_size ≠ bottom_size ≠ 1. When axis is –1, four input dimensions are required.)

[Parameter]

  • axis: (optional) int32, default to 1
  • end_axis: (optional) int32, default to -1

[Constraint]

axis < end axis

[Quantitative tool support]

Yes

14

FullConnection

Computes an inner product.

[Input]

One input

[Parameter]

  • num_output: (optional) uint32
  • bias_term: (optional) bool, default to true
  • weight_filler: (optional) FillerParameter, 2D
  • bias_filler: (optional) FillerParameter, 1D
  • axis: (optional) int32, default to 1
  • transpose: (optional) bool, default to false

[Constraint]

  • transpose = false, axis = 1
  • Bais_C ≤ 56832
  • To quantify the model, the following dimension restrictions must be satisfied:

    − When N = 1, then 2 x CEIL(C, 16) x 16 x xH x xW ≤ 1024 x 1024

    − When N > 1, then 2 x 16 x CEIL(C, 16) x 16 x xH x xW ≤ 1024 x 1024

[Quantitative tool support]

Yes

15

Interp

Interpolation layer

[Input]

One input

[Parameter]

  • height: (optional) int32, default to 0
  • width: (optional) int32, default to 0
  • zoom_factor: (optional) int32, default to 1
  • shrink_factor: (optional) int32, default to 1
  • pad_beg: (optional) int32, default to 0
  • pad_end: (optional) int32, default to 0
NOTE:

l zoom_factor and shrink_factor are exclusive.

l height and zoom_factor are exclusive.

l height and shrink_factor are exclusive.

[Constraint]

(outputH x outputW)/(inputH x inputW) > 1/30

[Quantitative tool support]

No

16

Log

Performs logarithmic operation on the input.

[Input]

One input

[Parameter]

  • base: (optional) float, default to –1.0
  • scale: (optional) float, default to 1.0
  • shift: (optional) float, default to 0.0

[Constraint]

None

[Quantitative tool support]

No

17

LRN

Normalizes the input in a local region.

[Input]

One non-constant input

[Parameter]

  • local_size: (optional) uint32, default to 5
  • alpha: (optional) float, default to 1
  • beta: (optional) float, default to 0.75
  • norm_region: (optional) enum, default to ACROSS_CHANNELS (ACROSS_CHANNELS = 0, WITHIN_CHANNEL = 1)
  • lrnk: (optional) float, default to 1
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Constraint]

  • local_size is an odd number greater than 0.
  • Inter-channel: If local_size is within [1, 15]: lrnK > 0.00001 and beta > 0.01; Otherwise, lrnK and beta are any values. lrnK and alpha are not 0 at the same time. When the C dimension is greater than 1,776, local_size < 1728.
  • Intra-channel: lrnK = 1, local_size is within [1, 15], beta > 0.01.

[Quantitative tool support]

Yes

18

LSTM

Long and short term memory network (LSTM)

[Input]

Two or three inputs

  • X: time sequence data (T x B x Xt), which is in the NCHW 4D format,
  • where, N corresponds to the time sequence length T, C corresponds to the batch size B, H corresponds to the input data Xt at time point t, and W is fixed at 1.
  • Cont: sequence continuity flag (T x B)
  • Xs: (optional) static data (B x Xt)

[Parameter]

  • num_output: (optional) uint32, default to 0
  • weight_filler: (optional) FillerParameter
  • bias_filler: (optional) FillerParameter
  • debug_info: (optional) bool, default to false
  • expose_hidden: (optional) bool, default to false

[Constraint]

Restrictions involving intermediate variables:

a = (ALIGN(xt, 16) + ALIGN(output, 16)) x 16 x 2 x 2

b = (ALIGN(xt, 16) + ALIGN(output, 16)) x 16 x 4 x 2 x 2

c = use_projection ? ALIGN(ht, 16) x ALIGN(output, 16) x 2):0

d = 16 x ALIGN(ht, 16) x 2

e = batchNum x 4

The constraints are as follows:

a + b + c ≤ 1024 x 1024

d ≤ 256 x 1024/8

e ≤ 256 x 1024/32

[Quantitative tool support]

No

19

Normalize

Normalization layer

[Input]

One input

[Parameter]

  • across_spatial: (optional) bool, default to true
  • scale_filler: (optional) default to 1.0
  • channel_shared: (optional) bool, default to true
  • eps: (optional) float, default to 1e-10

[Constraint]

  • 1e – 7 < eps ≤ 0.1 + (1e – 6)
  • across_spatial can only be true for Caffe, indicating normalization by channel.

[Quantitative tool support]

Yes

20

Permute

Permutes the input dimensions according to a given mode.

[Input]

One input

[Parameter]

order: uint32, array

[Constraint]

None

[Quantitative tool support]

Yes

21

Pooling

Pools the input.

[Input]

One input

[Parameter]

  • pool: (optional) enum, indicating the pooling method, MAX = 0, AVE = 1, and STOCHASTIC = 2, default to MAX
  • pad: (optional) uint32, default to 0
  • pad_h: (optional) uint32, default to 0
  • pad_w: (optional) uint32, default to 0
  • kernel_size: (optional) uint32, exclusive with kernel_h/kernel_w
  • kernel_h: (optional) uint32
  • kernel_w: (optional) uint32, used in pair with kernel_h
  • stride: (optional) uint32, default to 1
  • stride_h: (optional) uint32
  • stride_w: (optional) uint32
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2
  • global_pooling: (optional) bool, default to false
  • ceil_mode: (optional) bool, default to true
  • round_mode: (optional) enum, CEIL = 0, FLOOR = 1, default to CEIL

[Constraint]

  • kernelH ≤ inputH + padTop + padBottom
  • kernelW ≤ inputW + padLeft + padRight
  • padTop < windowH
  • padBottom < windowH
  • padLeft < windowW
  • padRight < windowW

In addition to common restrictions, the following restrictions must be satisfied.

The global pool mode supports only the following ranges:

  1. outputH==1 && outputW==1 && kernelH>=inputH && kernelW>=inputW
  2. inputH*inputW ≤ 10,000

[Quantitative tool support]

Yes

22

Power

Computes the output y as (scale * x + shift)^power.

[Input]

One input

[Parameter]

  • power: (optional) float, default to 1.0
  • scale: (optional) float, default to 1.0
  • shift: (optional) float, default to 0.0

[Constraint]

  • power! = 1
  • scale * x + shift > 0

[Quantitative tool support]

Yes

23

Prelu

Activation function

[Input]

One input

[Parameter]

  • filler: optional
  • channel_shared: (optional) bool, indicating whether to share slope parameters across channels, default to false

[Constraint]

None

[Quantitative tool support]

Yes

24

PriorBox

Obtains the real location of the target from the box proposals.

[Input]

One input

[Parameter]

  • min_size: (mandatory) indicating the minimum frame size (in pixels)
  • max_size: (mandatory) indicating the maximum frame size (in pixels)
  • aspect_ratio: array, float. A repeated ratio is ignored. If no aspect ratio is provided, the default ratio 1 is used.
  • flip: (optional) bool, default to true. The value true indicates that each aspect ratio is reversed. For example, for aspect ratio r, the aspect ratio 1.0/r is generated.
  • clip: (optional) bool, default to false. The value true indicates that the previous value is clipped to the range [0, 1].
  • variance: array, used to adjust the variance of the BBoxes
  • img_size: (optional) uint32, exclusive with img_h or img_w
  • img_h: (optional) uint32
  • img_w: (optional) uint32
  • step: (optional) float, exclusive with step_h or step_w
  • step_h: (optional) float
  • step_w: (optional) float
  • offset: float, default to 0.5

[Constraint]

Used for the SSD network only

Output dimensions: [n, 2, detected boxes x 4, 1]

[Quantitative tool support]

Yes

25

Proposal

Sorts the box proposals by (proposal, score) and obtains the top N proposals by using the NMS.

[Input]

Three inputs (scores, bbox_pred, im_info)

[Parameter]

  • feat_stride: (optional) float
  • base_size: (optional) float
  • min_size: (optional) float
  • ratio: float array
  • scale: float array
  • pre_nms_topn: (optional) int32
  • post_nms_topn: (optional) int32
  • nms_thresh: (optional) float

[Constraint]

  • Used only for Faster R-CNN
  • ProposalParameter and PythonParameter are exclusive.
  • Value range of preTopK: 1-6,144
  • Value range of postTopK: 1-1,024
  • scaleCnt x ratioCnt ≤ 64
  • nmsTresh: threshold for Intersection-over-Union (IoU) box filtering, 0 < nmsTresh ≤ 1
  • minSize: minimum edge length of a box. A value less than this parameter is filtered out.
  • featStride: H/W stride between the two adjacent boxes used in default box generation
  • baseSize: default box size used in default box generation
  • ratio and scale: used in default box generation
  • imgH and imgW: height and width of the image input to the network. The values must be greater than 0.
  • Restrictions on the input dimensions:

    clsProb: C = 2 x scaleCnt x ratioCnt

    bboxPred: C = 4 x scaleCnt x ratioCnt

    bboxPrior: N = clsProb.N, C = 4 x scaleCnt x ratioCnt

    imInfo: N = clsProb.N, C = 3

[Quantitative tool support]

Yes

26

PSROIPooling

Position-sensitive region-of-interest pooling (PSROIPooling)

[Input]

Two inputs

[Parameter]

  • spatial_scale: (mandatory) float
  • output_dim: (mandatory) int32, indicating the number of output channels
  • group_size: (mandatory) int32, indicating the number of groups to encode position-sensitive score maps

[Constraint]

Used for the Region-based Fully Convolutional Network (R-FCN)

  • ROI coordinates [roiN, roiC, roiH, roiW]: 1 ≤ roiN ≤ 65535, roiC == 5, roiH == 1, roiW == 1
  • Dimensions of the input feature map: [xN, xC, xH, xW]

    pooledH == pooledW == groupSize ≤ 128

    pooledH and pooledW indicate the length and width of the pooled ROI.

  • Output format: y [yN, yC, yH, yW]
  • poolingMode == avg pooling, pooledH == pooledW == groupSize, pooledH ≤ 128, spatialScale > 0, groupSize > 0, outputDim > 0
  • 1 ≤ xN ≤ 65535, roisN % xN == 0
  • HW_LIMIT defines the limits of xH and xW.

    xHW = xH * xW

    pooledHW = pooledH * pooledW

    HW_LIMIT = (64 x 1024 – 8 x 1024)/32

    xH ≥ pooledH, xW ≥ pooledW

    xHW ≥ pooledHW

    xHW/pooledHW ≤ HW_LIMIT

  • In multi-batch scenarios, the ROIs are allocated equally to the batches. In addition, the batch sequence of the ROIs is the same as the feature.

[Quantitative tool support]

Yes

27

Relu

Activation function, including common ReLU and Leaky ReLU, which can be specified by parameters

[Input]

One input

[Parameter]

  • negative_slope: (optional) float, default to 0
  • engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Constraint]

None

[Quantitative tool support]

Yes

28

Reshape

Reshapes the input.

[Input]

One input

[Parameter]

  • shape: constant, int64 or int
  • axis: (optional) int32, default to 0
  • num_axes: (optional) int32, default to -1

[Constraint]

None

[Quantitative tool support]

Yes

29

ROIAlign

Aggregates features using ROIs.

[Input]

At least two inputs

[Parameter]

  • pooled_h: (optional) uint32, default to 0
  • pooled_w: (optional) uint32, default to 0
  • spatial_scale: (optional) float, default to 1
  • sampling_ratio: (optional) int32, default to -1

[Constraint]

Mainly used for Mask R-CNN

  • Restrictions on the feature map:

1) H x W ≤ 5,248 (N > 1) or W x C < 40,960 (N = 1)

2) C ≤ 1280

3) ((C - 1)/128 + 1) x pooledW ≤ 216

  • Restrictions on the ROI:

1) C = 5 (caffe), H = 1, W = 1

2) samplingRatio * pooledW ≤ 128, samplingRatio * pooledH ≤ 128

3) H ≥ pooledH, W ≥ pooledW

[Quantitative tool support]

Yes

30

ROIPooling

Maps ROI proposals to a feature map.

[Input]

At least two inputs

[Parameter]

  • pooled_h: (mandatory) uint32, default to 0
  • pooled_w: (mandatory) uint32, default to 0
  • spatial_scale: (mandatory) float, default to 1. The multiplication spatial scale factor is used to convert ROI coordinates from the input scale to the pool scale.

[Constraint]

Mainly used for Faster R-CNN

  • Input dimensions: H x W ≤ 8,160, H ≤ 120, W ≤ 120
  • Output dimensions: pooledH ≤ 20, pooledW ≤ 20

[Quantitative tool support]

Yes

31

Scale

out = alpha x Input + beta

[Input]

Two inputs, each with four dimensions

[Parameter]

  • axis: (optional) int32, default to 1. Only 1 or –3 is supported.
  • num_axes: (optional) int32, default to 1
  • filler: (optional) ignored unless only one bottom is given and scale is a learned parameter
  • bias_term: (optional) bool, default to false, indicating whether to learn a bias (equivalent to ScaleLayer + BiasLayer, but may be more efficient). Initialized with bias_filler.
  • bias_filler: (optional) default to 0

[Constraint]

shape of scale and bias: (n, c, 1, 1), with the C dimension equal to that of the input

[Quantitative tool support]

Yes

32

ShuffleChannel

Shuffles information across the feature channels.

[Input]

One input

[Parameter]

group: (optional) uint32, default to 1

[Constraint]

None

[Quantitative tool support]

Yes

33

Sigmoid

Activation function

[Input]

One input

[Parameter]

engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Constraint]

None

[Quantitative tool support]

Yes

34

Slice

Slices an input into multiple outputs.

[Input]

One input

[Parameter]

  • slice_dim: (optional) uint32, default to 1, exclusive with axis
  • slice_point: array, uint32
  • axis: (optional) int32, default to 1, indicating concatenation along the channel dimension

[Constraint]

None

[Output]

None

[Quantitative tool support]

Yes

35

Softmax

Normalized logic function

[Input]

One input

[Parameter]

  • engine: (optional) default to 0, CAFFE = 1, CUDNN = 2
  • axis: (optional) int32, default to 1, indicating the axis along which softmax is performed

[Constraint]

Softmax can be performed on each of the four input dimensions.

According to axis:

  • When axis = 1: C ≤ ((256 x 1024/4) – 8 x 1024 – 256)/2
  • When axis = 0: n ≤ (56 x 1024 – 256)/2
  • When axis = 2: W = 1, 0 < h < (1024 x 1024/32)
  • When axis = 3: 0 < W < (1024 x 1024/32)

If the input contains fewer than four dimensions, softmax is performed only on the last dimension, with the last dimension ≤ 46,080.

[Quantitative tool support]

Yes

36

Tanh

Activation function

[Input]

One input

[Parameter]

engine: (optional) enum, default to 0, CAFFE = 1, CUDNN = 2

[Constraint]

The number of tensor elements cannot exceed INT32_MAX.

[Quantitative tool support]

Yes

37

Upsample

Backward propagation of max pooling

[Input]

Two inputs

[Parameter]

scale: (optional) int32, default to 1

[Constraint]

None

[Quantitative tool support]

Yes

38

SSDDetectionOutput

SSD network detection output

[Input]

Three inputs

[Parameter]

  • num_classes: (mandatory) int32, indicating the number of classes to be predicted
  • share_location: (optional) bool, default to true, indicating that classes share one bounding box
  • background_label_id: (optional) int32, default to 0
  • nms_param: (optional) indicating non-maximum suppression (NMS)
  • save_output_param: (optional) indicating whether to save the detection result
  • code_type: (optional) default to CENTER_SIZE
  • variance_encoded_in_target: (optional) bool, default to true. The value true indicates that the variance is encoded in the target, otherwise the prediction offset needs to be adjusted accordingly.
  • keep_top_k: (optional) int32, indicating the total number of BBoxes to be reserved for each image after NMS
  • confidence_threshold: (optional) float, indicating that only the detection whose confidence is above the threshold is considered. If this parameter is not set, all boxes are considered.
  • nms_threshold: (optional) float
  • top_k: (optional) int32
  • boxes: (optional) int32, default to 1
  • relative: (optional) bool, default to true
  • objectness_threshold: (optional) float, default to 0.5
  • class_threshold: (optional) float, default to 0.5
  • biases: array
  • general_nms_param: optional

[Constraint]

  • Used for the SSD network only
  • Value range of preTopK and postTopK: 1–1024
  • shareLocation = true
  • nmsEta = 1
  • Value range of numClasses: 1–2048
  • code_type = CENTER_SIZE
  • Value range of nms_threshold and confidence_threshold: 0.0–1.0

[Quantitative tool support]

Yes

39

Reorg

Real-time object detection

[Input]

One input

[Parameter]

  • stride: (optional) uint32, default to 2
  • reverse: (optional) bool, default to false

[Constraint]

sed only for YOLOv2

[Quantitative tool support]

No

40

Reverse

Reversion

[Input]

One input

[Parameter]

axis: (optional) int32, default to 1. Controls the axis to be reversed. The content layout will not be reversed.

[Constraint]

None

[Quantitative tool support]

No

41

LeakyRelu

LeakyRelu activation function

[Input]

One input

[Parameter]

Same as ReLU

[Constraint]

None

[Quantitative tool support]

Yes

42

YOLODetectionOutput

YOLO network detection output

[Input]

Four inputs

[Parameter]

  • num_classes: (mandatory) int32, indicating the number of classes to be predicted
  • share_location: (optional) bool, default to true, indicating that classes share one bounding box
  • background_label_id: (optional) int32, default to 0
  • nms_param: (optional) indicating non-maximum suppression (NMS)
  • save_output_param: (optional) indicating whether to save the detection result
  • code_type: (optional) default to CENTER_SIZE
  • variance_encoded_in_target: (optional) bool, default to true. The value true indicates that the variance is encoded in the target, otherwise the prediction offset needs to be adjusted accordingly.
  • keep_top_k: (optional) int32, indicating the total number of BBoxes to be reserved for each image after NMS
  • confidence_threshold: (optional) float, indicating that only the detection whose confidence is above the threshold is considered. If this parameter is not set, all boxes are considered.
  • nms_threshold: (optional) float
  • top_k: (optional) int32
  • boxes: (optional) int32, default to 1
  • relative: (optional) bool, default to true
  • objectness_threshold: (optional) float, default to 0.5
  • class_threshold: (optional) float, default to 0.5
  • biases: array
  • general_nms_param: optional

[Constraint]

  • sed only for YOLOv2
  • classNUm < 10,240; anchorBox ≤ 8
  • W ≤ 1,536
  • The upper layer of yolodetectionoutput must be the yoloregion operator.

[Quantitative tool support]

No