Using PyCharm ToolKit to Train a Model
This section describes how to recognize handwritten digits and images using MXNet, helping you quickly train and deploy a model locally using PyCharm ToolKit. For more information about PyCharm ToolKit, see the Tool Guide.
MNIST is a dataset containing handwritten digits, and is often used as an introductory example of deep learning. In this example, use a model training script (provided by ModelArts by default) compiled using the native interface of MXNet based on the MNIST dataset for model training in ModelArts PyCharm ToolKit, and deploy the model as a real-time service. After the deployment is completed, you can use the real-time service to identify whether an input image contains a certain digit.
Before you start, carefully complete the preparations described in Preparations. The procedure for building a model is as follows:
Preparations
- PyCharm 2019.2 or later has been installed on the local PC. The Windows version (Community or Professional) is recommended. Download the PyCharm ToolKit and install it on the local PC.
- You have registered with HUAWEI CLOUD and checked the account status before using ModelArts. The account cannot be in arrears or frozen.
- On the ModelArts management console, access authorization has been configured for the current account.
For details, see Configuring Agency Authorization. For users who have been authorized using access keys, you are advised to clear the authorization and configure agency authorization.
- You have created a bucket and folders in OBS for storing the sample dataset and model. In this example, create a bucket named test-modelarts and folders listed in Table 1.
OBS buckets and folders are ready. For details about how to create OBS buckets and folders, see Creating a Bucket and Creating a Folder. Ensure that the OBS directory you use and ModelArts are in the same region.
Step 1: Install PyCharm ToolKit and Add an Access Key
- Download the PyCharm ToolKit installation package.
- Start PyCharm on the local PC.
- On the PyCharm interface, choose File > Settings. The Settings dialog box is displayed.
- In the Settings dialog box, click Plugins in the left navigation pane. Click the setting icon on the right, and choose Install Plugin from Disk. The dialog box for selecting files is displayed.
Figure 1 Selecting a plug-in from the local host
- In the displayed dialog box, select the ToolKit package from the local directory and click OK.
Figure 2 Choosing the plug-in file
- Click Restart IDE to restart PyCharm. In the displayed dialog box, click Restart.
Figure 3 Restarting PyCharm
- If the ModelArts tab page is displayed on the PyCharm toolbar after the restart, ToolKit has been installed.
Figure 4 Successful installation
- Obtain the access key and add it to PyCharm.
- Obtain the access key (AK/SK) of the account. For details, see Obtaining an Access Key.
- After PyCharm ToolKit is installed, add the access key to ToolKit. For details, see Using Access Keys for Login.
Figure 5 Entering the region and access key
- View the verification result.
In the Event Log area, if information similar to the following is displayed, the access key has been successfully added:
16:01 Validate Credential Success: The HUAWEI CLOUD credential is valid.
Step 2: Prepare Data
ModelArts provides a sample MNIST dataset named Mnist-Data-Set. This example uses this dataset to build a model. Perform the following operations to upload the dataset to the OBS directory test-modelarts/dataset-mnist created in preparation.
- Download the Mnist-Data-Set dataset to the local PC.
- Decompress the Mnist-Data-Set.zip file to the Mnist-Data-Set directory on the local PC.
- Upload all files in the Mnist-Data-Set folder to the test-modelarts/dataset-mnist directory on OBS in batches. For details about how to upload files, see Uploading a File.
The following provides content of the Mnist-Data-Set dataset. .gz is the compressed package.
- t10k-images-idx3-ubyte: validation set, which contains 10,000 samples
- t10k-images-idx3-ubyte.gz: compressed package file of the validation set.
- t10k-labels-idx1-ubyte: labels of the validation set, which contains the labels of the 10,000 samples
- t10k-labels-idx1-ubyte.gz: compressed package file of the validation set label
- train-images-idx3-ubyte: training set, which contains 60,000 samples
- train-images-idx3-ubyte.gz: compressed package file of the training set
- train-labels-idx1-ubyte: labels of the training set, which contains the labels of the 60,000 samples
- train-labels-idx1-ubyte.gz: compressed package file of the training set label
Step 3: Compile Training Code
- Go to the ModelArts-Lab project on Gitee, click Clone/Download, and then click Download ZIP to download the project.
- After the project is downloaded, decompress the ModelArts-Lab-master.zip file and obtain training code file train_mnist.py from the \ModelArts-Lab-master\official_examples\Using_MXNet_to_Create_a_MNIST_Dataset_Recognition_Application\codes directory.
- Open the PyCharm tool, choose File > New Project to create a project. Create src in the project directory, and copy the training code file train_mnist.py to the src directory.
Figure 6 Coping training code to the src directory
Step 4: Train a Model
After data and code preparation is ready, you can create a training job. Select the MXNet engine, and generate an available model based on the local train_mnist.py training script. This example provides a compiled code script (based on the native interface of the MXNet engine). If you use your own code, select the engine and version supported by ModelArts.
- On the PyCharm toolbar, choose ModelArts > Edit Training Job Configuration.
- In the displayed dialog box, set the training parameters as follows:
- Job Name: A job name is automatically generated. You can specify a name when submitting a training job for the first time.
- AI Engine: Select MXNet whose version is MXNet-1.2.1-python3.6.
- Algorithm Source: Select Frequently-used, which indicates a frequently-used framework.
- Specifications: Select GPU specifications.
- OBS Path: Enter the output path created in Preparations. The path is used to store the training output model and log files.
- Data Path in OBS: Enter the OBS directory to which data is uploaded in Step 2: Prepare Data. The path must be a complete OBS path, including the OBS bucket name.
The following figure shows an example. Change the value to your OBS bucket and path.
- Boot File Path: Select the local training train_mnist.py.
- Code Directory: Select the src directory where the boot script is located.
- Running Parameters: Input parameters required by the training script. In this example, there is no parameter, and you can leave this parameter blank.
After setting the parameters, click Apply and Run to submit the training job to ModelArts.
The MNIST dataset has a large amount of data. To improve the training efficiency, select a GPU-based resource pool for training. However, the cost of the GPU is higher than that of the CPU. You can select an available resource pool based on the actual situation.
Figure 7 Setting training job parameters
- After the training job is submitted, view the training logs in the lower part of the page. If information similar to Current training job status: Successful is displayed in the training log, the training job has been successfully executed.
Figure 8 Viewing training logs
ModelArts Event Log indicates logs printed by the tool, and ModelArts Training Log indicates logs printed by training script code.
The logs show that the tool automatically uploads code of the local project to OBS on the cloud and then submits a training job. After the job is submitted, the tool obtains logs from the training environment on the cloud in real time, and displays the logs on the ModelArts Training Log page until the job is complete.
- On the left menu bar of PyCharm, click ModelArts Explorer, select the submitted job, and double-click the version number V0001 to view the job details.
Figure 9 Selecting the training job and version
Figure 10 Training job details
Step 5: Compile the Inference Code and Configuration Files and Upload Them to the Path of the Model
ModelArts provides inference code customize_service.py and configuration file config.json required in this example. The files and training code in the downloaded git project are in the same directory. This inference code and configuration files are sample files provided by ModelArts.
If the training job runs multiple times, Training Output Path has multiple values, that is, directories of multiple versions, such as V0001 and V0002, are generated in the mnist-output directory. Upload the files to the model folder of the corresponding training job version.
Go to the OBS Console, find the test-modelarts bucket, go to the test-modelarts/mnist-output/MA-mnist-11-30-16/output/V001/model directory, and click Upload Object to upload objects. For details about how to upload files to OBS, see Uploading a File.
Step 6: Deploy the Model as a Real-Time Service
The trained model is stored in the OBS path. You can import the model to ModelArts and deploy it as a real-time service.
- Right-click the training job version, and choose Deploy to Service from the shortcut menu.
Figure 11 Deploying the model as a real-time service
- In the displayed dialog box, set the parameters as follows:
- Service Name: The parameter value is automatically generated or can be customized.
- Auto Stop: If you select Auto Stop, the service automatically stops at the specified time.
- Model Path: This parameter is automatically filled in. The value must be the same as the name and version of the selected training job.
- Environment Variables: Set it to input_data_name=images;input_data_shape=0,1,28,28;output_data_shape=0,10.
- input_data_name: The value must be images. This parameter is mandatory for the scenario where the training script is developed by yourself. You can set this parameter when importing the model, or write it into the inference code.
- input_data_shape: NCHW. In this example, the value is 0,1,28,28, indicating the requirements on the input images. The size of the input images must be 28 x 28 pixels.
- output_data_shape: confidence score. In this example, the value is 0,10, indicating that the output result displays 10 digits (classes) from 0 to 9 and the probability of each class.
After setting the parameters, click OK to start model deployment.
Figure 12 Deploying the model as a real-time service
View the deployment progress in the log area at the bottom of the page.
Figure 13 Viewing the deployment progress
It takes several minutes to deploy the model. If information similar to "Service status is running" is displayed, the service has been successfully deployed. After the deployment is complete, you can click a link to quickly switch to the real-time service page on the ModelArts management console.
You need to enter your HUAWEI CLOUD account and password for the first login.
Figure 14 Deployment complete
Step 7: Test the Service
After the real-time service is deployed, access the service to send a prediction request for test.
- After the deployment is successful, click the link provided on the page to access the real-time service.
- On the real-time service details page, click the Prediction tab.
- Click Upload next to Image File to upload an image with a white handwritten digit on a black background and click Predict.
After the prediction is completed, the prediction result is displayed in the Test Result pane. According to the prediction result, the probability of the digit 8 is 1.
- As specified in the inference code and configuration files, the size of the image used for prediction must be 28 x 28 pixels, and must contain white handwritten digit on the black background.
- You are advised not to use the images provided by the dataset. You can use the drawing tool provided by the Windows operating system to draw an image for prediction.
Figure 15 Prediction result
Step 8: Delete Related Resources to Avoid Unnecessary Billing
To avoid unnecessary billing, you are advised to delete related resources, such as the real-time service, training job, and OBS directories after trial use.
- Go to the ModelArts management console. To delete a real-time service, go to the Real-Time Services page, and choose More > Delete in the Operation column.
- Go to the ModelArts management console. To delete a training job, go to the Training Jobs page and click Delete in the Operation column.
- Log in to OBS Console and delete the OBS bucket created during data preparation. Delete folders and files in the bucket one by one and then delete the bucket.
Last Article: Modeling with Notebook
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.