Updated on 2024-10-29 GMT+08:00

Introduction to Data Preparation

Data management is being upgraded and is invisible to users who have not used data management.

The driving forces behind AI are computing power, algorithms, and data. Data quality affects model precision. Generally, a large amount of high-quality data is more likely to train a high-precision AI model. Models trained using normal data achieves 85% to 90% accuracy, while commercial applications have higher requirements. If you want to improve the model accuracy to 96% or even 99%, a large amount of high-quality data is required. In this case, the data must be more refined, scenario-based, and professional. The preparation of a large amount of high-quality data has become a challenging issue in AI development.

ModelArts is a one-stop AI development platform that supports AI lifecycle development, including data processing, algorithm development, model training, and model deployment. In addition, ModelArts provides AI Gallery that can be used to share data, algorithms, and models. ModelArts data management provides end-to-end data preparation, processing, and labeling.

ModelArts data management provides the following functions for you to obtain high-quality AI data:

  • Data acquisition
    • Allows you to import data from OBS, MRS, DLI, and GaussDB(DWS).
    • Provides 18+ data augmentation operators to increase data volume for training.
  • Improved data quality
    • Allows you to preview various formats of data including images, text, audios, and videos, helping you identify data quality.
    • Allows you to filter data by multiple search criteria, such as sample attributes and labeling information.
    • Provides 12+ labeling tools for refined, scenario-based, and professional data labeling.
    • Performs feature analysis based on samples and labeling results, helping you understand data quality.
  • More efficient data preparation
    • Allows you to manage data by version for more efficient data management.
    • Provides capabilities such as interactive labeling and auto labeling for more efficient data labeling.
    • Enables team labeling and team labeling management for labeling a large amount of data.