Data Management

During AI development, massive volumes of data need to be processed, and data preparation and labeling usually take more than half of the development time. ModelArts data management provides an efficient data management and labeling framework. It supports different data types such as image, text, audio, and video, and covers a range of labeling scenarios such as image classification, object detection, speech paragraph labeling, and text classification. It is used to AI projects such as computer vision, natural language processing, and audio and video analysis. In addition, it provides functions such as data filtering, data analysis, data processing, intelligent labeling, team labeling, and version management. This framework enables AI developers to fulfill the entire data labeling process. See the following figure.

Figure 1 Data labeling process

ModelArts data management provides clustering analysis, data cleansing, data augmentation, data selection, feature analysis, and other processing functions, helping you further understand, filter, and mine data to present high-value data that meets development objectives or project requirements.

You can select the appropriate labeling tool in data management to label data in the specified scenario. Models trained by built-in algorithms or custom algorithms can be selected for intelligent labeling. Only a small amount of manual labeling and correction are required to obtain accurate labeling results. You can create a team to perform collaborative labeling, improving labeling efficiency. ModelArts allows project-based management for labeling by individual developers, small-scale labeling by small teams, and large-scale labeling by professional teams.

For large-scale team labeling, ModelArts provides team management, personnel management, and data management to implement the entire process, from project creation, allocation, management, labeling, to acceptance. For small-scale labeling by individuals and small teams, ModelArts provides an easy-to-use labeling tool to minimize project management costs.

In addition, the labeling platform ensures data security. User data is used only within the authorized scope. The labeling object allocation policy ensures user data privacy and implements data anonymization.