Help Center> ModelArts> User Guide (Senior AI Engineers)> Model Management> Model Evaluation and Diagnosis> Evaluation Metrics

Evaluation Metrics

The available evaluation metrics are evaluation overview, precision evaluation, sensitivity analysis, computing performance analysis, heatmap, abstract feature distribution, and adversarial analysis. They are suitable for image classification, object detection, and image semantic segmentation scenarios. The evaluation overview, precision evaluation, and sensitivity analysis metrics have specific parameters in the proceeding scenarios, and the computing performance analysis, heatmap, abstract feature distribution, and adversarial analysis metrics are available only in image classification scenarios.

Common Part

**Table 1** Parameters for evaluation overview
Parameter	Description
Overall Metric	It is Accuracy for image classification, mAP for object detection, and PA for image semantic segmentation. For details, see the metric description in specific scenarios.
Prediction Results	Displays the prediction results, label status, and confidence levels.
Overall Evaluation	Provides phenomenons and optimization suggestions based on the analysis of prediction results and datasets, and displays the phenomenons and optimization suggestions with a higher priority.

Image Classification

Each column of a confusion matrix represents the actual label statistics, and each row represents the prediction result statistics. The data on the diagonal of the matrix represents all the correct prediction results. Some concepts are used to calculate precision. For example, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are used for binary classification tasks.

**Table 2** Concepts involved in the confusion matrix for image classification
Parameter	Actual: Positive	Actual: Negative
Predicted: Positive	TP	FP
Predicted: Negative	FN	TN
Total samples	P = TP + FN	N = FP + TN

**Table 3** Evaluation metrics for model classification models
Metric	Parameter	Description
Precision Evaluation	Category Distribution	Statistics for the number of different categories of samples.
	Confusion Matrix	For details about the confusion matrix, see Table 2.
	Recall (R)	Ratio of the number of correct positive predictions to the total number of positives. A larger value indicates a smaller false negative rate (FNR). The calculation formula is as follows: R = TP/(TP + FN). That is, the number of correct predictions in a column of the confusion matrix divided by the sum of samples in the column.
	Precision (P)	Ratio of the number of correct positive predictions to the total number of positive predictions. A larger value indicates a smaller false positive rate (FPR). The calculation formula is as follows: P = TP/(TP + FP). That is, the number of correct predictions in a row of the confusion matrix divided by the sum of samples in the row.
	F1 Score	Harmonic mean of the precision and recall. The formula is as follows: F1 = 2 x P x R/(P + R).
	ROC Curve	The ROC curve is used to draw the true positive rate (TPR, vertical coordinate) and false positive rate (FPR, horizontal coordinate) when different classification thresholds are used. The closer the ROC curve to the upper left corner, the better the classifier performance.
Sensitivity Analysis	Accuracy in Different Feature Value Ranges	Divide an image into several parts based on the feature values, such as the brightness and clarity, test the precision of these parts, and draw a chart.
	Feature Distribution	Displays the distribution of image feature values using charts.
	F1 Score	The F1 scores of different types of data in different feature value ranges are displayed to determine the feature value ranges in which the model has a better effect.
Computing Performance Analysis (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)	Operator Duration Ratio and Parameter Quantity Ratio	Calculate the ratios of various parameters in the network, such as convolution and pooling, as well as the time consumption ratio in a forward process.
	Other Metrics	Includes basic model information such as the GPU usage, time required, model size, total number of parameters, and total computing amount.
Heatmap (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)	Display Heatmap	Heatmap drawn using the gradcam++ algorithm. The highlighted area indicates the area used by the model to determine the image inference result.
Abstract Feature Distribution (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)	Feature Distribution	Extracts the convolutional layer output of the layer before the fully connected layer of the basic network for image classification. For example, in a ResNet-50 network, a 1 x 2048 matrix is output for an image. The dimensions of the output are reduced to 2 and are drawn on a 2D scatter chart.
Adversarial Sample Evaluation (This parameter is not displayed by default and is supported only by the built-in algorithm resnet_v1_50.)	PSNR	The peak signal-to-noise ratio (PSNR) indicates the ratio between the maximum possible value (power) of a signal and the power of distorting noise that affects the quality of its representation.
	SSIM	The Structural Similarity Index (SSIM) is used for measuring the similarity between two images. It is often used to compare distortion-free images and distorted images.
	ACAC	Average Conﬁdence of Adversarial Class (ACAC).
	ACTC	Average Confidence of True Class (ACTC). This parameter is used to further evaluate the extent to which the attack deviates from the actual value.
	MR	Proportion of adversarial examples that are classified as incorrect classes or classified as target classes.
	ALD	The Average Lp distortion (ALD) represents the average Lp of successful adversarial examples. A smaller value indicates that adversarial examples are less likely to be detected.
	Others	Similar to the metrics in the precision evaluation.

Computing Performance Analysis supports only built-in TensorFlow-based image classification algorithms. Heatmap, Abstract Feature Distribution, and Adversarial Sample Evaluation support only TensorFlow-based image classification algorithms. To display these metrics, you need to modify the files required for generating the evaluation code. For details, see the image classification part in Sample Code for Model Evaluation.

Object Detection

**Table 4** Evaluation metrics for object detection models
Metric	Parameter	Description
Precision Evaluation	Category Distribution	Statistics for the number of different categories of bounding boxes.
	Precision-Recall Curve (P-R Curve)	Sort samples based on the confidence score of each class, add the samples to the positive samples one by one for prediction, and calculate the precision and recall rates. The curve drawn using this series of precision and recall rates is the P-R curve of the corresponding class.
	mAP with Different IoUs	Calculate the mAP with different IoUs and draw a curve to present the IoU with the highest mAP. The IoU is a threshold used by the NMS to filter an overlapping box that may be predicted as the same object. For details, see Figure 1.
	F1 Scores with Different Confidence Thresholds	Calculate the average F1 value under different confidence thresholds, draw a curve, and feed back the threshold with the highest F1 value.
	False Positive Analysis	From the perspective of prediction results, collect statistics on accurate detections, class false positives, background false positives, and position deviations. Draw a pie chart using the proportion of each type of error. For details about error types, see Figure 2.
	False Negative Analysis	From the perspective of actual labels, collect statistics on accurate detections, class false positives, background false positives, and position deviations. Draw a pie chart using the proportion of each type of error. For details about error types, see Figure 3.
Sensitivity Analysis	Accuracy in Different Feature Value Ranges	It is similar to that for image classification. However, you can select more features related to the target bounding boxes, such as the overlap between the target bounding boxes and the number of target bounding boxes.
Sensitivity Analysis	Feature Distribution	It is similar to that for image classification. However, you can select more features related to the target bounding boxes, such as the overlap between the target bounding boxes and the number of target bounding boxes.

Figure 1 IoU calculation
Click to enlarge

From the perspective of prediction results, if the IoU of the predicted bounding box and the actual bounding box is greater than 0.5, the predicted bounding box is inconsistent with the actual bounding box, and a class false positive error occurred. If the IoU is greater than 0.1 and less than 0.5, the predicted bounding box is consistent with the actual bounding box, and a position false positive error occurred. If the IoU is less than 0.1, a background false positive error occurred.

Figure 2 False positive analysis
Click to enlarge

From the perspective of the actual bounding box, if the IoU of the actual bounding box and the predicted bounding box is greater than 0.5, the actual bounding box is inconsistent with the predicted bounding box, and a class false negative error occurred. If the IoU is greater than 0.1 and less than 0.5, the actual bounding box is consistent with the predicted bounding box, and a position false negative error occurred. If the IoU is less than 0.1, a background false negative error occurred.

Figure 3 False negative analysis
Click to enlarge

Image semantic segmentation

**Table 5** Evaluation metrics for image semantic segmentation models
Metric	Parameter	Description
Precision Evaluation	Pixel Category Distribution	Statistics for the number of different categories of pixels.
	loU	It calculates the IoU between each class of prediction result and the label set. You can obtain the mean IoU by averaging the values of each class. The formula for calculating the IoU is as follows: Assume that the total number of classes is k+1, pii indicates the number of correct class identifications of the i^th class, and pij indicates the number of classes of the i^th class that are identified as the j^th class.
	Dice Coefficient	The value ranges from 0 to 1. A value closer to 1 indicates a better model. The formula for calculating the Dice coefficient is as follows: Assume that the total number of classes is k+1, pii indicates the number of correct class identifications of the i^th class, and pij indicates the number of classes of the i^th class that are identified as the j^th class.
	Confusion Matrix	It is the same as that for image classification except that this confusion matrix is for each pixel instead of each image.
Sensitivity Analysis	Sensitivity Analysis	It is the same as that for image classification except that the evaluation metric is changed from F1 to IoU.

Parent topic: Model Evaluation and Diagnosis

Last Article: Viewing Evaluation Results

Next Article: Managing Evaluation Job Versions

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English