Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Data Feature Analysis

Updated on 2024-04-30 GMT+08:00

Images or target bounding boxes are analyzed based on image features, such as blurs and brightness to draw visualized curves to help process datasets.

You can also select multiple versions of a dataset to view their curves for comparison and analysis.

Background

  • Data feature analysis is only available for image datasets of the image classification and object detection types.
  • Data feature analysis is only available for the published datasets. The published dataset versions in Default format support data feature analysis.
  • A data scope for feature analysis varies depending on the dataset type.
    • In a dataset of the object detection type, if the number of labeled samples is 0, the View Data Feature tab page is unavailable and data features are not displayed after a version is published. After the images are labeled and the version is published, the data features of the labeled images are displayed.
    • In a dataset of the image classification type, if the number of labeled samples is 0, the View Data Feature tab page is unavailable and data features are not displayed after a version is published. After the images are labeled and the version is published, the data features of all images are displayed.
  • The analysis result is valid only when the number of images in a dataset reaches a certain level. Generally, more than 1,000 images are required.
  • Image classification supports the following data feature metrics: Resolution, Aspect Ratio, Brightness, Saturation, Blur Score, and Colorfulness Object detection supports all data feature metrics. Supported Data Feature Metrics provides all data feature metrics supported by ModelArts.

Data Feature Analysis

  1. Log in to the ModelArts management console.. In the navigation pane, choose Data Management > Datasets.
  2. Locate the target dataset, click More in the Operation column, and select View Data Feature. The View Data Feature tab of the dataset is displayed.

    You can also click a dataset name to go to the dataset page and click the View Data Feature tab.

  3. By default, feature analysis is not started for published datasets. You need to manually start feature analysis tasks for each dataset version. On the View Data Feature tab, click Analyze Features.
  4. In the dialog box that is displayed, configure the dataset version for feature analysis and click Yes to start analysis.
    Version: Select a published version of the dataset.
    Figure 1 Starting a data feature analysis task
  5. After a data feature analysis task is started, it takes a certain period of time to complete, depending on the data volume. If the selected version is displayed in the Version drop-down list and can be selected, the analysis is complete.
  6. View the data feature analysis result.

    Version: Select the version to be compared from the drop-down list You can also select only one version.

    Type: Select the type to be analyzed. The value can be all, train, eval, or inference.

    Data Feature Metric: Select metrics to be displayed from the drop-down list. For details, see Supported Data Feature Metrics.

    Then, the selected version and metrics are displayed on the page. The displayed chart helps you understand data distribution for better data processing.

  7. View historical records of the analysis task.

    After data feature analysis is complete, you can click Task History on the right of the Data Features tab page to view historical analysis tasks and their statuses in the dialog box that is displayed.

Supported Data Feature Metrics

Table 1 Data feature metrics

Metric

Description

Explanation

Resolution

Image resolution. An area value is used as a statistical value.

Metric analysis results are used to check whether there is an offset point. If an offset point exists, you can resize or delete the offset point.

Aspect Ratio

An aspect ratio is a proportional relationship between an image's width and height.

The chart of the metric is in normal distribution, which is generally used to compare the difference between the training set and the dataset used in the real scenario.

Brightness

Brightness is the perception elicited by the luminance of a visual target. A larger value indicates better image brightness.

The chart of the metric is in normal distribution. You can determine whether the brightness of the entire dataset is high or low based on the distribution center. You can adjust the brightness based on your application scenario. For example, if the application scenario is night, the brightness should be lower.

Saturation

Color saturation of an image. A larger value indicates that the entire image color is easier to distinguish.

The chart of the metric is in normal distribution, which is generally used to compare the difference between the training set and the dataset used in the real scenario.

Blur Score

Clarity

Image clarity, which is calculated using the Laplace operator. A larger value indicates clearer edges and higher clarity.

You can determine whether the clarity meets the requirements based on the application scenario. For example, if data is collected from HD cameras, the clarity must be higher. You can sharpen or blur the dataset and add noises to adjust the clarity.

Colorfulness

Horizontal coordinate: Colorfulness of an image. A larger value indicates richer colors.

Vertical coordinate: Number of images

Colorfulness on the visual sense, which is generally used to compare the difference between the training set and the dataset used in the real scenario.

Bounding Box Number

Horizontal coordinate: Number of bounding boxes in an image

Vertical coordinate: Number of images

It is difficult for a model to detect a large number of bounding boxes in an image. Therefore, more images containing many bounding boxes are required for training.

Std of Bounding Boxes Area Per Image

Standard Deviation of Bounding Boxes Per Image

Horizontal coordinate: Standard deviation of bounding boxes in an image. If an image has only one bounding box, the standard deviation is 0. A larger standard deviation indicates higher bounding box size variation in an image.

Vertical coordinate: Number of images

It is difficult for a model to detect a large number of bounding boxes with different sizes in an image. You can add data for training based on scenarios or delete data if such scenarios do not exist.

Aspect Ratio of Bounding Boxes

Horizontal coordinate: Aspect ratio of the target bounding boxes

Vertical coordinate: Number of bounding boxes in all images

The chart of the metric is generally in Poisson distribution, which is closely related to application scenarios. This metric is mainly used to compare the differences between the training set and the validation set. For example, if the training set is a rectangle, the result will be significantly affected if the validation set is close to a square.

Area Ratio of Bounding Boxes

Horizontal coordinate: Area ratio of the target bounding boxes, that is, the ratio of the bounding box area to the entire image area. A larger value indicates a higher ratio of the object in the image.

Vertical coordinate: Number of bounding boxes in all images

The metric is used to determine the distribution of anchors used in the model. If the target bounding box is large, set the anchor to a large value.

Marginalization Value of Bounding Boxes

Horizontal coordinate: Marginalization degree, that is, the ratio of the distance between the center point of the target bounding box and the center point of the image to the total distance of the image. A larger value indicates that the object is closer to the edge. (The total distance of an image is the distance from the intersection point of a ray (that starts from the center point of the image and passes through the center point of the bounding box) and the image border to the center point of the image.)

Vertical coordinate: Number of bounding boxes in all images

Generally, the chart of the metric is in normal distribution. The metric is used to determine whether an object is at the edge of an image. If a part of an object is at the edge of an image, you can add a dataset or do not label the object.

Overlap Score of Bounding Boxes

Overlap Score of Bounding Boxes

Horizontal coordinate: Overlap degree, that is, the part of a single bounding box overlapped by other bounding boxes. The value ranges from 0 to 1. A larger value indicates that more parts are overlapped by other bounding boxes.

Vertical coordinate: Number of bounding boxes in all images

The metric is used to determine the overlapping degree of objects to be detected. Overlapped objects are difficult to detect. You can add a dataset or do not label some objects based on your needs.

Brightness of Bounding Boxes

Brightness of Bounding Boxes

Horizontal coordinate: Brightness of the image in the target bounding box. A larger value indicates brighter image.

Vertical coordinate: Number of bounding boxes in all images

Generally, the chart of the metric is in normal distribution. The metric is used to determine the brightness of an object to be detected. In some special scenarios, the brightness of an object is low and may not meet the requirements.

Blur Score of Bounding Boxes

Clarity of Bounding Boxes

Horizontal coordinate: Clarity of the image in the target bounding box. A larger value indicates higher image clarity.

Vertical coordinate: Number of bounding boxes in all images

The metric is used to determine whether the object to be detected is blurred. For example, a moving object may become blurred during collection and its data needs to be collected again.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback