Evaluation Metrics

The available evaluation metrics are evaluation overview, precision evaluation, sensitivity analysis, computing performance analysis, heatmap, abstract feature distribution, and adversarial analysis. They are suitable for image classification, object detection, and image semantic segmentation scenarios. The evaluation overview, precision evaluation, and sensitivity analysis metrics have specific parameters in the proceeding scenarios, and the computing performance analysis, heatmap, abstract feature distribution, and adversarial analysis metrics are available only in image classification scenarios.

Common Part

Table 1 Parameters for evaluation overview

Parameter

Description

Overall Metric

It is Accuracy for image classification, mAP for object detection, and PA for image semantic segmentation. For details, see the metric description in specific scenarios.

Prediction Results

Displays the prediction results, label status, and confidence levels.

Overall Evaluation

Provides phenomenons and optimization suggestions based on the analysis of prediction results and datasets, and displays the phenomenons and optimization suggestions with a higher priority.

Image Classification

Each column of a confusion matrix represents the actual label statistics, and each row represents the prediction result statistics. The data on the diagonal of the matrix represents all the correct prediction results. Some concepts are used to calculate precision. For example, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are used for binary classification tasks.

Table 2 Concepts involved in the confusion matrix for image classification

Parameter

Actual: Positive

Actual: Negative

Predicted: Positive

TP

FP

Predicted: Negative

FN

TN

Total samples

P = TP + FN

N = FP + TN

Table 3 Evaluation metrics for model classification models

Metric

Parameter

Description

Precision Evaluation

Category Distribution

Statistics for the number of different categories of samples.

Confusion Matrix

For details about the confusion matrix, see Table 2.

Recall (R)

Ratio of the number of correct positive predictions to the total number of positives. A larger value indicates a smaller false negative rate (FNR). The calculation formula is as follows: R = TP/(TP + FN). That is, the number of correct predictions in a column of the confusion matrix divided by the sum of samples in the column.

Precision (P)

Ratio of the number of correct positive predictions to the total number of positive predictions. A larger value indicates a smaller false positive rate (FPR). The calculation formula is as follows: P = TP/(TP + FP). That is, the number of correct predictions in a row of the confusion matrix divided by the sum of samples in the row.

F1 Score

Harmonic mean of the precision and recall. The formula is as follows: F1 = 2 x P x R/(P + R).

ROC Curve

The ROC curve is used to draw the true positive rate (TPR, vertical coordinate) and false positive rate (FPR, horizontal coordinate) when different classification thresholds are used. The closer the ROC curve to the upper left corner, the better the classifier performance.

Sensitivity Analysis

Accuracy in Different Feature Value Ranges

Divide an image into several parts based on the feature values, such as the brightness and clarity, test the precision of these parts, and draw a chart.

Feature Distribution

Displays the distribution of image feature values using charts.

F1 Score

The F1 scores of different types of data in different feature value ranges are displayed to determine the feature value ranges in which the model has a better effect.

Computing Performance Analysis

(This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)

Operator Duration Ratio and Parameter Quantity Ratio

Calculate the ratios of various parameters in the network, such as convolution and pooling, as well as the time consumption ratio in a forward process.

Other Metrics

Includes basic model information such as the GPU usage, time required, model size, total number of parameters, and total computing amount.

Heatmap

(This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)

Display Heatmap

Heatmap drawn using the gradcam++ algorithm. The highlighted area indicates the area used by the model to determine the image inference result.

Abstract Feature Distribution

(This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)

Feature Distribution

Extracts the convolutional layer output of the layer before the fully connected layer of the basic network for image classification. For example, in a ResNet-50 network, a 1 x 2048 matrix is output for an image. The dimensions of the output are reduced to 2 and are drawn on a 2D scatter chart.

Adversarial Sample Evaluation

(This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)

PSNR

The peak signal-to-noise ratio (PSNR) indicates the ratio between the maximum possible value (power) of a signal and the power of distorting noise that affects the quality of its representation.

SSIM

The Structural Similarity Index (SSIM) is used for measuring the similarity between two images. It is often used to compare distortion-free images and distorted images.

ACAC

Average Confidence of Adversarial Class (ACAC).

ACTC

Average Confidence of True Class (ACTC). This parameter is used to further evaluate the extent to which the attack deviates from the actual value.

MR

Proportion of adversarial examples that are classified as incorrect classes or classified as target classes.

ALD

The Average Lp distortion (ALD) represents the average Lp of successful adversarial examples. A smaller value indicates that adversarial examples are less likely to be detected.

Others

Similar to the metrics in the precision evaluation.

Computing Performance Analysis supports only built-in TensorFlow-based image classification algorithms. Heatmap, Abstract Feature Distribution, and Adversarial Sample Evaluation support only TensorFlow-based image classification algorithms. To display these metrics, you need to modify the files required for generating the evaluation code. For details, see the image classification part in Sample Code for Model Evaluation.

Object Detection

Table 4 Evaluation metrics for object detection models

Metric

Parameter

Description

Precision Evaluation

Category Distribution

Statistics for the number of different categories of bounding boxes.

Precision-Recall Curve (P-R Curve)

Sort samples based on the confidence score of each class, add the samples to the positive samples one by one for prediction, and calculate the precision and recall rates. The curve drawn using this series of precision and recall rates is the P-R curve of the corresponding class.

mAP with Different IoUs

Calculate the mAP with different IoUs and draw a curve to present the IoU with the highest mAP. The IoU is a threshold used by the NMS to filter an overlapping box that may be predicted as the same object. For details, see Figure 1.

F1 Scores with Different Confidence Thresholds

Calculate the average F1 value under different confidence thresholds, draw a curve, and feed back the threshold with the highest F1 value.

False Positive Analysis

From the perspective of prediction results, collect statistics on accurate detections, class false positives, background false positives, and position deviations. Draw a pie chart using the proportion of each type of error. For details about error types, see Figure 2.

False Negative Analysis

From the perspective of actual labels, collect statistics on accurate detections, class false positives, background false positives, and position deviations. Draw a pie chart using the proportion of each type of error. For details about error types, see Figure 3.

Sensitivity Analysis

Accuracy in Different Feature Value Ranges

It is similar to that for image classification. However, you can select more features related to the target bounding boxes, such as the overlap between the target bounding boxes and the number of target bounding boxes.

Feature Distribution

It is similar to that for image classification. However, you can select more features related to the target bounding boxes, such as the overlap between the target bounding boxes and the number of target bounding boxes.

Figure 1 IoU calculation

From the perspective of prediction results, if the IoU of the predicted bounding box and the actual bounding box is greater than 0.5, the predicted bounding box is inconsistent with the actual bounding box, and a class false positive error occurred. If the IoU is greater than 0.1 and less than 0.5, the predicted bounding box is consistent with the actual bounding box, and a position false positive error occurred. If the IoU is less than 0.1, a background false positive error occurred.

Figure 2 False positive analysis

From the perspective of the actual bounding box, if the IoU of the actual bounding box and the predicted bounding box is greater than 0.5, the actual bounding box is inconsistent with the predicted bounding box, and a class false negative error occurred. If the IoU is greater than 0.1 and less than 0.5, the actual bounding box is consistent with the predicted bounding box, and a position false negative error occurred. If the IoU is less than 0.1, a background false negative error occurred.

Figure 3 False negative analysis

Image semantic segmentation

Table 5 Evaluation metrics for image semantic segmentation models

Metric

Parameter

Description

Precision Evaluation

Pixel Category Distribution

Statistics for the number of different categories of pixels.

loU

It calculates the IoU between each class of prediction result and the label set. You can obtain the mean IoU by averaging the values of each class. The formula for calculating the IoU is as follows:

Assume that the total number of classes is k+1, pii indicates the number of correct class identifications of the ith class, and pij indicates the number of classes of the ith class that are identified as the jth class.

Dice Coefficient

The value ranges from 0 to 1. A value closer to 1 indicates a better model. The formula for calculating the Dice coefficient is as follows:

Assume that the total number of classes is k+1, pii indicates the number of correct class identifications of the ith class, and pij indicates the number of classes of the ith class that are identified as the jth class.

Confusion Matrix

It is the same as that for image classification except that this confusion matrix is for each pixel instead of each image.

Sensitivity Analysis

Sensitivity Analysis

It is the same as that for image classification except that the evaluation metric is changed from F1 to IoU.