Help Center> ModelArts> Best Practices> Frequently-used Frameworks> Using Spark MLlib for Targeted Recommendations

Using Spark MLlib for Targeted Recommendations

People may feel good or bad during consumption. Time is money. People do not want to spend too much time finding things they like. Fast and accurate recommendation can significantly reduce the time required for consumers to look for desired goods, remarkably improving consumer experience and increasing the sales volume of merchants.

You can use the Spark MLlib algorithm on the ModelArts platform to obtain the direction of precision marketing and assist decision-making, increasing the conversion rate of consumer goods and profits of merchants as well as improving consumer experience.

Before using the following sample, complete necessary operations based on Preparations. The following figure shows the process of targeted recommendation.

  1. Preparing Data: Download the dataset and sample code, and upload them to the OBS bucket.
  2. Training a Model: Compile a model training script based on the ALS algorithm in Spark MLlib and create a training job for model training.
  3. Deploying the Model: After obtaining the trained model file, create a prediction job to deploy the model as a real-time prediction service.
  4. Performing Prediction: Initiate a prediction request and obtain the prediction result.

Preparing Data

ModelArts provides the dataset and sample code for training. Perform the following steps to download the dataset and sample code and upload them to OBS:

  1. Go to the ModelArts-Lab project on Gitee, click Clone/Download, and then click Download ZIP to download the project.
  2. After the download is completed, decompress the ModelArts-Lab-master.zip file and obtain training dataset ratings.csv and sample code files trainmodelsp.py and customize_service.py from the ModelArts-Lab-master\official_examples\Using_Spark_MLlib_to_Create_a_Precise_Recommendation_Application directory.
    Table 1 File description

    File

    Description

    ratings.csv

    Training dataset. For details about the dataset, see Table 2 and Table 3.

    trainmodelsp.py

    Training script. The sample code is a Python training script compiled using the ALS algorithm.

    customize_service.py

    Custom prediction script. It must be uploaded to the same OBS path as training script trainmodelsp.py. During the training, the script is automatically copied to the corresponding model directory.

    Table 2 Parameters and meanings of data sources

    Parameter

    Meaning

    Type

    Description

    attr_1

    User ID

    Int

    Consumer ID

    attr_2

    Product ID

    Int

    Consumer goods ID

    attr_3

    Rating

    Real

    Consumer's rating on goods (1-5)

    Table 3 Sample data of the dataset

    attr_1

    attr_2

    attr_3

    1

    146

    5

    1

    1198

    4

    1

    611

    4

    2

    914

    3

    2

    146

    4

  3. On OBS Console, create a bucket and folders for storing the training dataset and sample code. For example, create the test-modelarts bucket and create the sparkml/data and sparkml/code folders in the bucket.

    Use sparkml/data to store training dataset ratings.csv, and sparkml/code to store sample code files trainmodelsp.py and customize_service.py.

  4. Upload the files obtained in 2 to the sparkml/data and sparkml/code folders in the corresponding OBS directory. For details about how to upload files to OBS, see Uploading a File.

Training a Model

  1. On the ModelArts management console, choose Training Management > Training Jobs, and click Create in the upper left corner.
  2. Set the parameters related to the training job based on Figure 1 and Figure 2, and click Next.
    Set Data Source and Algorithm Source to the OBS path and files in Preparing Data. For Training Output Path, you are advised to create an OBS folder to store the training output model and prediction files, for example, sparkml/output.
    Figure 1 Basic information for creating a training job
    Figure 2 Parameter settings for creating a training job
  3. On the Confirm tab page, check the configurations and click Submit to create a training job.
  4. On the Training Management > Training Jobs page, when the training job status changes to Running Success, the model training is completed. If any exception occurs, click the job name to go to the job details page and view the training job logs.

    The training job may take more than 10 minutes to complete. If the training time exceeds a certain period (for example, one hour), manually stop it to release resources. Otherwise, the account balance may be insufficient, especially for the models that are trained using GPUs.

Deploying the Model

After the training job is completed, the trained model can be published as a prediction service.

  1. On the Models page, click Import. The Import Model page is displayed.
  2. Set parameters as shown in Figure 3 and click Next.

    Set Meta Model to the path specified by Training Output Path in the training job. At the same time, the system automatically matches the inference code generated by the training job from the selected path.

    Figure 3 Import Model
  3. On the Models page, when the status of the created model is Normal, the model is successfully imported. Click the triangle next to a model name to expend all versions of the model. In the row of a version, choose Deploy > Real-Time Services in the Operation column to deploy the model as a real-time service.
  4. On the Deploy page, set parameters by referring to Figure 4 and click Next.
    Figure 4 Deploy
  5. On the Confirm tab page, check the configurations and click Submit to create a real-time service.
  6. After the real-time service is created, the Service Deployment > Real-Time Services page is displayed. The service deployment takes some time. When the service status changes to Running, the service is successfully deployed.

Performing Prediction

After the model to be deployed is executed, verify that the published prediction service is normal.

  1. Choose Service Deployment > Real-Time Services, and click the service name to go to the details page.
  2. On the Prediction tab page, enter the prediction code and click Predict, as shown in Figure 5. In the Response area on the right, view the prediction result.

    The prediction request code is as follows:

    {
    "data": {
    "req_data": [
    {
    "input_1": 2,
    "input_2": 21
    }
    ]
    }
    }
    Figure 5 Testing the server

  3. On the Usage Guides tab page, obtain the API to be called and use the Postman tool to perform the test.
    Figure 6 Usage Guides