Updated on 2023-12-15 GMT+08:00

Data Management

During AI development, massive volumes of data need to be processed, and data preparing and labeling usually take more than half of the time required for the entire development process. ModelArts data management provides an efficient data management and labeling framework. It supports image, text, audio, and video data types in a range of labeling scenarios such as image classification, object detection, speech paragraph labeling, and text classification so that data management can be used in various AI projects such as computer vision, natural language processing, and audio and video analysis projects. In addition, ModelArts data management provides functions such as data filtering, data analysis, data processing, team labeling, and version management, enabling you to manage the full data labeling process. Figure 1 shows the data labeling process.

Figure 1 Data labeling process

ModelArts data management analyzes and processes data using such functions as clustering analysis, data feature analysis, data cleansing, data verification, data augmentation, and data selection, helping you obtain high-value data that meets development or project requirements.

With data management, ModelArts allows you to label data online for image classification, object detection, speech paragraphs, text triplet, and videos. You can also use intelligent labeling to automatically label data through built-in or customized algorithms, improving the labeling efficiency.

To support large-scale collaborative labeling, data management provides team labeling with team management, personnel management, and data management for full-process project management, from project creation, data allocation, progress control, labeling, review, to acceptance. This improves labeling efficiency and minimizes project management costs.

ModelArts data management ensures the security and privacy of user data and allows data to be used only within the authorized scope.

In the new version of data management, datasets and data labeling are decoupled to facilitate your operations.