Help Center> ModelArts> Best Practices> Frequently-used Frameworks> Using Spark MLlib for Vehicle Satisfaction Survey

Using Spark MLlib for Vehicle Satisfaction Survey

Use the k-Nearest Neighbor (kNN) classification algorithm for vehicle satisfaction survey. The Car Evaluation dataset is used to evaluate six features to obtain the satisfaction of users on vehicles.

The kNN algorithm can generate a binary or multiclass model. The basic idea is that if k samples closest to a sample in the feature space belong to a certain class, the sample also belongs to this class. This algorithm has good robustness against noise, and can avoid the problem of unbalanced sample quantity. k is an integer specified by the user. Its value depends on the input data. Generally, a larger value of k suppresses the impact of noise, but also blurs the classification boundary. This algorithm applies to the classification scenario where the number of data records is fewer than 1 million and the dimensions are fewer than 100. In binary classification, it is helpful to set k to an odd number as this avoids tied votes.

This example describes how to use the kNN classification algorithm of the Spark MLlib engine to conduct vehicle satisfaction survey. The procedure is as follows:

  1. Preparing Data: Download the dataset and sample code, and upload them to the OBS bucket.
  2. Training a Model: Compile a model training script based on the ALS algorithm in Spark MLlib and create a training job for model training.
  3. Deploying the Model: After obtaining the trained model file, create a prediction job to deploy the model as a real-time prediction service.
  4. Performing Prediction: Initiate a prediction request and obtain the prediction result.

Preparing Data

ModelArts provides the dataset and sample code for training. Perform the following steps to download the dataset and sample code and upload them to OBS:

  1. Go to the ModelArts-Lab project on Gitee, click Clone/Download, and then click Download ZIP to download the project.
  2. After the download is completed, decompress the ModelArts-Lab-master.zip file and obtain the training dataset and sample code from the \ModelArts-Lab-master\official_examples\Using_Spark_MLlib_to_Create_a_Car_Evaluation_Application directory.
    Table 1 File description

    File

    Description

    car.csv

    Training dataset. For details about the dataset, see Table 2.

    car_meta.desc

    Metadata file

    knn_classification.py

    Training script compiled with the ALS algorithm

    customize_service.py

    Custom prediction script. It must be uploaded to the same OBS path as training script knn_classification.py. During the training, the script is automatically copied to the corresponding model directory.

    Table 2 Sample data of the dataset

    buying_price

    maint_price

    doors

    persons

    lug_boot

    safety

    acceptability

    vhigh

    vhigh

    2

    2

    small

    low

    unacc

    vhigh

    vhigh

    2

    2

    small

    med

    unacc

    vhigh

    vhigh

    2

    2

    small

    high

    unacc

    vhigh

    vhigh

    2

    2

    med

    low

    unacc

  3. On OBS Console, create a bucket and folders for storing the training dataset and sample code. For example, create the test-modelarts2 bucket and create the sparkml/car/data and sparkml/car/code folders in the bucket.
  4. Upload the files obtained in 2 to the sparkml/car/data and sparkml/car/code folders in the corresponding OBS directory. For details about how to upload files to OBS, see Uploading a File.

Training a Model

  1. On the ModelArts management console, choose Training Management > Training Jobs, and click Create in the upper left corner.
  2. Set parameters related to the training job and click Next. See Figure 1 and Figure 2.

    Set Data Source and Algorithm Source to the OBS path and files in Preparing Data. For Training Output Path, you are advised to create an OBS folder to store the training output model and prediction files, for example, test-modelarts/car/output.

    Figure 1 Basic information for creating a training job
    Figure 2 Parameters for creating a training job
  3. On the Confirm tab page, check the configurations and click Submit to create a training job.
  4. On the Training Jobs page, when the training job status changes to Running Success, the model training is completed. If any exception occurs, click the job name to go to the job details page and view the training job logs.

    The training job may take more than 10 minutes to complete. If the training time exceeds a certain period (for example, one hour), manually stop it to release resources. Otherwise, the account balance may be insufficient, especially for the training job using GPUs.

Deploying the Model

After the training job is completed, the trained model can be published as a prediction service.

  1. On the Models page, click Import. The Import Model page is displayed.
  2. Set parameters as shown in Figure 3 and click Next.
    Set Meta Model to the path specified by Training Output Path in the training job. At the same time, the system automatically matches the AI engine and inference code generated by the training job from the selected path.
    Figure 3 Import Model
  3. On the Models page, when the status of the created model is Normal, the model is successfully imported. Choose Deploy > Real-Time Services in the Operation column to deploy the model as a real-time service.
  4. On the Deploy page, set parameters by referring to Figure 4 and click Next.
    Figure 4 Deploying a service
  5. On the Confirm tab page, check the configurations and click Submit to create a real-time service.
  6. After the real-time service is created, the Service Deployment > Real-Time Services page is displayed. The service deployment takes some time. When the service status changes to Running, the service is successfully deployed.

Performing Prediction

After the model to be deployed is executed, verify that the published prediction service is normal.

  1. Choose Service Deployment > Real-Time Services, and click the service name to go to the details page.
  2. On the Prediction tab page, enter the prediction code and click Predict. See Figure 5. In the Response area on the right, view the prediction result.
    The prediction request code is as follows:
    {
    	"data": {
    		"req_data": [
    			{
    				"buying_price": "high",
    				"maint_price": "high",
    				"doors": "2",
    				"persons": "2",
    				"lug_boot": "small",
    				"safety": "low",
    				"acceptability": "acc"
    			},
    			{
    				"buying_price": "high",
    				"maint_price": "high",
    				"doors": "2",
    				"persons": "2",
    				"lug_boot": "small",
    				"safety": "low",
    				"acceptability": "acc"
    			}
    		]
    	}
    }
    Figure 5 Testing the server

  3. On the Usage Guides tab page, obtain the API to be called and use the Postman tool to perform the test.