Using Spark MLlib for Targeted Recommendations
People may feel good or bad during consumption. Time is money. People do not want to spend too much time finding things they like. Fast and accurate recommendation can significantly reduce the time required for consumers to look for desired goods, remarkably improving consumer experience and increasing the sales volume of merchants.
You can use the Spark MLlib algorithm on the ModelArts platform to obtain the direction of precision marketing and assist decision-making, increasing the conversion rate of consumer goods and profits of merchants as well as improving consumer experience.
Before using the following sample, complete necessary operations based on Preparations. The following figure shows the process of targeted recommendation.
- Preparing Data: Download the dataset and sample code, and upload them to the OBS bucket.
- Training a Model: Compile a model training script based on the ALS algorithm in Spark MLlib and create a training job for model training.
- Deploying the Model: After obtaining the trained model file, create a prediction job to deploy the model as a real-time prediction service.
- Performing Prediction: Initiate a prediction request and obtain the prediction result.
Preparing Data
ModelArts provides the dataset and sample code for training. Perform the following steps to download the dataset and sample code and upload them to OBS:
- Go to the ModelArts-Lab project on Gitee, click Clone/Download, and then click Download ZIP to download the project.
- After the download is completed, decompress the ModelArts-Lab-master.zip file and obtain training dataset ratings.csv and sample code files trainmodelsp.py and customize_service.py from the ModelArts-Lab-master\official_examples\Using_Spark_MLlib_to_Create_a_Precise_Recommendation_Application directory.
Table 1 File description File
Description
ratings.csv
Training dataset. For details about the dataset, see Table 2 and Table 3.
trainmodelsp.py
Training script. The sample code is a Python training script compiled using the ALS algorithm.
customize_service.py
Custom prediction script. It must be uploaded to the same OBS path as training script trainmodelsp.py. During the training, the script is automatically copied to the corresponding model directory.
- On OBS Console, create a bucket and folders for storing the training dataset and sample code. For example, create the test-modelarts bucket and create the sparkml/data and sparkml/code folders in the bucket.
Use sparkml/data to store training dataset ratings.csv, and sparkml/code to store sample code files trainmodelsp.py and customize_service.py.
- Upload the files obtained in 2 to the sparkml/data and sparkml/code folders in the corresponding OBS directory. For details about how to upload files to OBS, see Uploading a File.
Training a Model
- On the ModelArts management console, choose Training Management > Training Jobs, and click Create in the upper left corner.
- Set the parameters related to the training job based on Figure 1 and Figure 2, and click Next.
Set Data Source and Algorithm Source to the OBS path and files in Preparing Data. For Training Output Path, you are advised to create an OBS folder to store the training output model and prediction files, for example, sparkml/output.
- On the Confirm tab page, check the configurations and click Submit to create a training job.
- On the Training Management > Training Jobs page, when the training job status changes to Running Success, the model training is completed. If any exception occurs, click the job name to go to the job details page and view the training job logs.
The training job may take more than 10 minutes to complete. If the training time exceeds a certain period (for example, one hour), manually stop it to release resources. Otherwise, the account balance may be insufficient, especially for the models that are trained using GPUs.
Deploying the Model
After the training job is completed, the trained model can be published as a prediction service.
- On the Models page, click Import. The Import Model page is displayed.
- Set parameters as shown in Figure 3 and click Next.
Set Meta Model to the path specified by Training Output Path in the training job. At the same time, the system automatically matches the inference code generated by the training job from the selected path.
- On the Models page, when the status of the created model is Normal, the model is successfully imported. Click the triangle next to a model name to expend all versions of the model. In the row of a version, choose Deploy > Real-Time Services in the Operation column to deploy the model as a real-time service.
- On the Deploy page, set parameters by referring to Figure 4 and click Next.
- On the Confirm tab page, check the configurations and click Submit to create a real-time service.
- After the real-time service is created, the Service Deployment > Real-Time Services page is displayed. The service deployment takes some time. When the service status changes to Running, the service is successfully deployed.
Performing Prediction
After the model to be deployed is executed, verify that the published prediction service is normal.
- Choose Service Deployment > Real-Time Services, and click the service name to go to the details page.
- On the Prediction tab page, enter the prediction code and click Predict, as shown in Figure 5. In the Response area on the right, view the prediction result.
The prediction request code is as follows:
{ "data": { "req_data": [ { "input_1": 2, "input_2": 21 } ] } } - On the Usage Guides tab page, obtain the API to be called and use the Postman tool to perform the test.
Figure 6 Usage Guides
Last Article: Using Caffe for Handwritten Digit Recognition
Next Article: Using MXNet for Caltech Image Recognition





Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.