Collecting Data
ModelArts provides an auto hard example identification function for you to filter hard example data from inference data inputted to an existing model based on built-in rules. This improves model precision, and effectively reduces labeling manpower required upon a model update. This function helps mine data that benefits model precision improvement as much as possible. You only need to confirm and label useful data and add it to a training dataset. Then, a new model with higher precision can be obtained after training.
For models deployed as batch services, data generated during service invoking is stored in an OBS directory by default. ModelArts can automatically filter hard examples from existing data based on configured rules, and output them to a dataset for future model training.
For batch services, data synchronization and hard example filtering involve the following scenarios, as shown in Figure 1.
- Synchronizing Data to a Dataset: Synchronize the input data of a batch service to a dataset for unified management and application.
- Hard Example Filtering: Enable the hard example filtering function to filter hard examples from the input data of a batch service using built-in algorithms. Finally, store hard examples in a corresponding dataset for retraining.
Synchronizing Data to a Dataset
The input data of batch services can be synchronized to a dataset. The synchronization operation merely stores data to the dataset without hard examples filtered. You can select an existing dataset or create a dataset to store data.
- Log in to the ModelArts management console and choose Service Deployment > Batch Services.
- Click the service name to go to the service details page, and click the Sample tab. Alternatively, choose More > Sample Collection in the Operation column in the service list.
Figure 2 Accessing the data collection page from the Batch Services page
- On the Sample tab page, click Synchronize Data to Dataset.
- In the displayed dialog box, select a labeling type and a dataset, and click OK to synchronize data to the dataset. The synchronized data will be displayed on the Unlabeled tab page of the dataset.
If there is no input data of a batch service, data synchronization cannot be implemented.
Figure 3 Synchronizing data to the dataset for a batch service
Hard Example Filtering
To filter hard examples from batch service data and store filtering results to a dataset, you need to enable hard example filtering tasks.
If batch service running is complete and hard example filtering is disabled, a hard example filtering task will not be executed. After configuring a hard example filtering task, you need to restart the batch service to execute the task.
- Log in to the ModelArts management console and choose Service Deployment > Batch Services.
- Enable a hard example filtering task
- When deploying a model as a batch service, enable Hard Example Filtering on the Deploy page.
Figure 4 Enabling the Hard Example Filtering function on the Deploy page
- After a batch service is deployed, click the service name to go to the service details page. Click the edit icon next to Hard Example Filtering to enable a hard example filtering task.
Figure 5 Enabling the Hard Example Filtering function on the details page
- When deploying a model as a batch service, enable Hard Example Filtering on the Deploy page.
- Set the parameters related to hard example filtering. For details, see Table 1. For batch services, hard examples are filtered from all data and no filtering policy is required, which is different from real-time services.
Table 1 Hard example filtering parameters Parameter
Description
Model Type
Model application type. Currently, only Image classification and Object detection are supported.
Existing Training Dataset Path
A model is trained based on a dataset and can be deployed as a batch service. When filtering hard examples, you can import the manifest file of the dataset corresponding to the batch service to find data problems underlying the model.
The model training and deployment process is as follows: Input training scripts and a dataset. > Train the dataset to obtain a model. > Deploy the model as a batch service.
This parameter is optional. You are advised to import the dataset to improve training precision. Currently, only the manifest file of the dataset can be imported. If a dataset is managed on ModelArts, publish the dataset to obtain its manifest file. If your dataset is not managed on ModelArts, import the manifest file by referring to Specifications for Importing the Manifest File.
Hard Example Output
Save the filtered hard example data to a dataset. You can select an existing dataset or create a new dataset.
A dataset type must match a model type. For example, if the model type is image classification, the dataset to which hard example data is outputted must be image classification.
Figure 6 Enabling hard example filtering
- After the hard example filtering task is configured and executed, view the task status on the Sample tab page of the batch service. After the task is complete, its status changes to Dataset imported. You can click the dataset link to quickly access the corresponding dataset. The filtered hard examples will be displayed on the To Be Confirmed tab page.
Figure 7 Status of a data collection task
Figure 8 Hard example filtering result
Last Article: Viewing the Batch Service Prediction Result
Next Article: Edge Services

Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.