Evaluation Metrics
The available evaluation metrics are evaluation overview, precision evaluation, sensitivity analysis, computing performance analysis, heatmap, abstract feature distribution, and adversarial analysis. They are suitable for image classification, object detection, and image semantic segmentation scenarios. The evaluation overview, precision evaluation, and sensitivity analysis metrics have specific parameters in the proceeding scenarios, and the computing performance analysis, heatmap, abstract feature distribution, and adversarial analysis metrics are available only in image classification scenarios.
Common Part
|
Parameter |
Description |
|---|---|
|
Overall Metric |
It is Accuracy for image classification, mAP for object detection, and PA for image semantic segmentation. For details, see the metric description in specific scenarios. |
|
Prediction Results |
Displays the prediction results, label status, and confidence levels. |
|
Overall Evaluation |
Provides phenomenons and optimization suggestions based on the analysis of prediction results and datasets, and displays the phenomenons and optimization suggestions with a higher priority. |
Image Classification
Each column of a confusion matrix represents the actual label statistics, and each row represents the prediction result statistics. The data on the diagonal of the matrix represents all the correct prediction results. Some concepts are used to calculate precision. For example, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are used for binary classification tasks.
|
Parameter |
Actual: Positive |
Actual: Negative |
|---|---|---|
|
Predicted: Positive |
TP |
FP |
|
Predicted: Negative |
FN |
TN |
|
Total samples |
P = TP + FN |
N = FP + TN |
|
Metric |
Parameter |
Description |
|---|---|---|
|
Precision Evaluation |
Category Distribution |
Statistics for the number of different categories of samples. |
|
Confusion Matrix |
For details about the confusion matrix, see Table 2. |
|
|
Recall (R) |
Ratio of the number of correct positive predictions to the total number of positives. A larger value indicates a smaller false negative rate (FNR). The calculation formula is as follows: R = TP/(TP + FN). That is, the number of correct predictions in a column of the confusion matrix divided by the sum of samples in the column. |
|
|
Precision (P) |
Ratio of the number of correct positive predictions to the total number of positive predictions. A larger value indicates a smaller false positive rate (FPR). The calculation formula is as follows: P = TP/(TP + FP). That is, the number of correct predictions in a row of the confusion matrix divided by the sum of samples in the row. |
|
|
F1 Score |
Harmonic mean of the precision and recall. The formula is as follows: F1 = 2 x P x R/(P + R). |
|
|
ROC Curve |
The ROC curve is used to draw the true positive rate (TPR, vertical coordinate) and false positive rate (FPR, horizontal coordinate) when different classification thresholds are used. The closer the ROC curve to the upper left corner, the better the classifier performance. |
|
|
Sensitivity Analysis |
Accuracy in Different Feature Value Ranges |
Divide an image into several parts based on the feature values, such as the brightness and clarity, test the precision of these parts, and draw a chart. |
|
Feature Distribution |
Displays the distribution of image feature values using charts. |
|
|
F1 Score |
The F1 scores of different types of data in different feature value ranges are displayed to determine the feature value ranges in which the model has a better effect. |
|
|
Computing Performance Analysis (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.) |
Operator Duration Ratio and Parameter Quantity Ratio |
Calculate the ratios of various parameters in the network, such as convolution and pooling, as well as the time consumption ratio in a forward process. |
|
Other Metrics |
Includes basic model information such as the GPU usage, time required, model size, total number of parameters, and total computing amount. |
|
|
Heatmap (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.) |
Display Heatmap |
Heatmap drawn using the gradcam++ algorithm. The highlighted area indicates the area used by the model to determine the image inference result. |
|
Abstract Feature Distribution (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.) |
Feature Distribution |
Extracts the convolutional layer output of the layer before the fully connected layer of the basic network for image classification. For example, in a ResNet-50 network, a 1 x 2048 matrix is output for an image. The dimensions of the output are reduced to 2 and are drawn on a 2D scatter chart. |
|
Adversarial Sample Evaluation (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.) |
PSNR |
The peak signal-to-noise ratio (PSNR) indicates the ratio between the maximum possible value (power) of a signal and the power of distorting noise that affects the quality of its representation. |
|
SSIM |
The Structural Similarity Index (SSIM) is used for measuring the similarity between two images. It is often used to compare distortion-free images and distorted images. |
|
|
ACAC |
Average Confidence of Adversarial Class (ACAC). |
|
|
ACTC |
Average Confidence of True Class (ACTC). This parameter is used to further evaluate the extent to which the attack deviates from the actual value. |
|
|
MR |
Proportion of adversarial examples that are classified as incorrect classes or classified as target classes. |
|
|
ALD |
The Average Lp distortion (ALD) represents the average Lp of successful adversarial examples. A smaller value indicates that adversarial examples are less likely to be detected. |
|
|
Others |
Similar to the metrics in the precision evaluation. |
Computing Performance Analysis supports only built-in TensorFlow-based image classification algorithms. Heatmap, Abstract Feature Distribution, and Adversarial Sample Evaluation support only TensorFlow-based image classification algorithms. To display these metrics, you need to modify the files required for generating the evaluation code. For details, see the image classification part in Sample Code for Model Evaluation.
Object Detection
|
Metric |
Parameter |
Description |
|---|---|---|
|
Precision Evaluation |
Category Distribution |
Statistics for the number of different categories of bounding boxes. |
|
Precision-Recall Curve (P-R Curve) |
Sort samples based on the confidence score of each class, add the samples to the positive samples one by one for prediction, and calculate the precision and recall rates. The curve drawn using this series of precision and recall rates is the P-R curve of the corresponding class. |
|
|
mAP with Different IoUs |
Calculate the mAP with different IoUs and draw a curve to present the IoU with the highest mAP. The IoU is a threshold used by the NMS to filter an overlapping box that may be predicted as the same object. For details, see Figure 1. |
|
|
F1 Scores with Different Confidence Thresholds |
Calculate the average F1 value under different confidence thresholds, draw a curve, and feed back the threshold with the highest F1 value. |
|
|
False Positive Analysis |
From the perspective of prediction results, collect statistics on accurate detections, class false positives, background false positives, and position deviations. Draw a pie chart using the proportion of each type of error. For details about error types, see Figure 2. |
|
|
False Negative Analysis |
From the perspective of actual labels, collect statistics on accurate detections, class false positives, background false positives, and position deviations. Draw a pie chart using the proportion of each type of error. For details about error types, see Figure 3. |
|
|
Sensitivity Analysis |
Accuracy in Different Feature Value Ranges |
It is similar to that for image classification. However, you can select more features related to the target bounding boxes, such as the overlap between the target bounding boxes and the number of target bounding boxes. |
|
Feature Distribution |
It is similar to that for image classification. However, you can select more features related to the target bounding boxes, such as the overlap between the target bounding boxes and the number of target bounding boxes. |
From the perspective of prediction results, if the IoU of the predicted bounding box and the actual bounding box is greater than 0.5, the predicted bounding box is inconsistent with the actual bounding box, and a class false positive error occurred. If the IoU is greater than 0.1 and less than 0.5, the predicted bounding box is consistent with the actual bounding box, and a position false positive error occurred. If the IoU is less than 0.1, a background false positive error occurred.
From the perspective of the actual bounding box, if the IoU of the actual bounding box and the predicted bounding box is greater than 0.5, the actual bounding box is inconsistent with the predicted bounding box, and a class false negative error occurred. If the IoU is greater than 0.1 and less than 0.5, the actual bounding box is consistent with the predicted bounding box, and a position false negative error occurred. If the IoU is less than 0.1, a background false negative error occurred.
Image semantic segmentation
|
Metric |
Parameter |
Description |
|---|---|---|
|
Precision Evaluation |
Pixel Category Distribution |
Statistics for the number of different categories of pixels. |
|
loU |
It calculates the IoU between each class of prediction result and the label set. You can obtain the mean IoU by averaging the values of each class. The formula for calculating the IoU is as follows:
Assume that the total number of classes is k+1, pii indicates the number of correct class identifications of the ith class, and pij indicates the number of classes of the ith class that are identified as the jth class. |
|
|
Dice Coefficient |
The value ranges from 0 to 1. A value closer to 1 indicates a better model. The formula for calculating the Dice coefficient is as follows:
Assume that the total number of classes is k+1, pii indicates the number of correct class identifications of the ith class, and pij indicates the number of classes of the ith class that are identified as the jth class. |
|
|
Confusion Matrix |
It is the same as that for image classification except that this confusion matrix is for each pixel instead of each image. |
|
|
Sensitivity Analysis |
Sensitivity Analysis |
It is the same as that for image classification except that the evaluation metric is changed from F1 to IoU. |
Last Article: Viewing Evaluation Results
Next Article: Managing Evaluation Job Versions





Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.