Combining Image Datasets Based on a Specific Ratio
Data combination is a process of combining multiple datasets based on a specific ratio and publishing the combined dataset. A proper ratio ensures the diversity, balance, and representativeness of datasets.
If a single dataset meets your requirements, skip this section and proceed with Publishing Image Datasets.
Creating an Image Data Combination Task
To create an image dataset combination task, perform the following steps:
- Log in to ModelArts Studio Large Model Deveopment Platform. In the My Spaces area, click the required workspace.
Figure 1 My Spaces
- In the navigation pane, choose Data Engineering > Data Processing > Combine Task. On the displayed page, click Create data combine in the upper right corner.
- In the Select Dataset area, select at least two image datasets and click Next.
- On the Data Combine page, set the ratio of different datasets and click Next.
- After the data combination configuration is complete, click Next in the lower right corner to go to the resource configuration page and select whether to automatically generate a processed dataset.
- Resource Allocation
Click
to expand resource configuration and set task resources. You can also customize parameters. Click Add Parameters and enter the parameter name and value.
Table 1 Parameter configuration Parameter Name
Description
numExecutors
Number of executors. The default value is 2.
numExecutors x executorMemory must be greater than or equal to 4 and less than or equal to 16.
executorCores
Number of CPU kernels used by each executor process. The default value is 2.
numExecutors x executorMemory must be greater than or equal to 4 and less than or equal to 16. The ratio of executorCores to executorMemory must be in the range of 1:2 to 1:4.
executorMemory
Memory size used by each Executor process. The default value is 4.
The ratio of executorCores to executorMemory must be in the range of 1:2 to 1:4.
driverCores
Number of CPU kernels used by each driver process. The default value is 2.
The ratio of driverCores to driverMemory must be in the range of 1:2 to 1:4.
driverMemory
Memory used by the driver process. The default value is 4.
The ratio of driverCores to driverMemory must be in the range of 1:2 to 1:4.
Figure 2 Resource Allocation - Automatically Generate Processing Dataset
Select and configure the information about the generated dataset, as shown in Figure 3. Click OK in the lower right corner. The platform starts the data combination task. After the task is successfully executed, a processed dataset is automatically generated.
If you do not select this option, click OK in the lower right corner. The platform starts the combination task. After the combination task is successfully executed, manually generate a processed dataset.
- (Optional) Extended Info
You can select the industry and language, or customize dataset properties.
Figure 4 Extended Info
- Resource Allocation
- Click OK. On the Data Combine Task page, after the task is executed successfully, check that the status is Success.
- Click Generate in the Operation column to generate a published dataset.
To view the published dataset, choose Data Engineering > Data Management > Datasets, and click the Published Dataset tab.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot