Help Center> ModelArts> Best Practices> Frequently-used Frameworks> Using Spark MLlib for Iris Flower Classification

Using Spark MLlib for Iris Flower Classification

The IRIS flower dataset is a dataset used for multivariate analysis. The dataset contains 150 sub-datasets classified into three classes. Each class contains 50 data records, and each data record contains four attributes. Use the four attributes (sepal length, sepal width, petal length, and petal width) to predict which type (Setosa, Versicolour, and Virginica) an iris flower belongs to.

This example describes how to use the Spark MLlib engine for iris flower classification. The procedure is as follows:

  1. Preparing Data: Download the dataset and sample code, and upload them to the OBS bucket.
  2. Training a Model: Compile a model training script based on the ALS algorithm in Spark MLlib and create a training job for model training.
  3. Deploying the Model: After obtaining the trained model file, create a prediction job to deploy the model as a real-time prediction service.
  4. Performing Prediction: Initiate a prediction request and obtain the prediction result.

Preparing Data

ModelArts provides the dataset and sample code for training. Perform the following steps to download the dataset and sample code and upload them to OBS:

  1. Go to the ModelArts-Lab project on Gitee, click Clone/Download, and then click Download ZIP to download the project.
  2. After the download is completed, decompress the ModelArts-Lab-master.zip file and obtain training dataset iris.csv and sample code file trainmodelsp.py from the \ModelArts-Lab-master\official_examples\Using_Spark_MLlib_to_Create_a_Flower_Classification_Application directory.
    Table 1 File description

    File

    Description

    iris.csv

    Training dataset. For details about the dataset, see Table 2 and Table 3.

    trainmodelsp.py

    Training script compiled with the ALS algorithm

    Table 2 Parameters and meanings of data sources

    Parameter

    Meaning

    Type

    sepal-length

    Sepal length

    number

    sepal-width

    Sepal width

    number

    petal-length

    Petal length

    number

    petal-width

    Petal width

    number

    class

    Type

    string

    Table 3 Sample data of the dataset

    5.1

    3.5

    1.4

    0.2

    Iris-setosa

    4.9

    3.0

    1.4

    0.2

    Iris-setosa

    4.7

    3.2

    1.3

    0.2

    Iris-setosa

    4.6

    3.1

    1.5

    0.2

    Iris-setosa

    5.0

    3.6

    1.4

    0.2

    Iris-setosa

  3. On OBS Console, create a bucket and folders for storing the training dataset and sample code. For example, create the test-modelarts bucket and create the iris/data and iris/code folders in the bucket.
  4. Upload the files obtained in 2 to the iris/data and iris/code folders in the corresponding OBS directory. For details about how to upload files to OBS, see Uploading a File.

Training a Model

  1. On the ModelArts management console, choose Training Management > Training Jobs, and click Create in the upper left corner.
  2. Set parameters related to the training job and click Next. See Figure 1 and Figure 2.
    Set Data Source and Algorithm Source to the OBS path and files in Preparing Data. For Training Output Path, you are advised to create an OBS folder to store the training output model and prediction files, for example, iris/output.
    Figure 1 Basic information for creating a training job
    Figure 2 Parameter settings for creating a training job
  3. On the Confirm tab page, check the configurations and click Submit to create a training job.
  4. On the Training Jobs page, when the training job status changes to Running Success, the model training is completed. If any exception occurs, click the job name to go to the job details page and view the training job logs.

    The training job may take more than 10 minutes to complete. If the training time exceeds a certain period (for example, one hour), manually stop it to release resources. Otherwise, the account balance may be insufficient, especially for the training job using GPUs.

Deploying the Model

After the training job is completed, the trained model can be published as a prediction service.

  1. On the Models page, click Import. The Import Model page is displayed.
  2. Set parameters as shown in Figure 3 and click Next.
    Set Meta Model to the path specified by Training Output Path in the training job. At the same time, the system automatically matches the AI engine and inference code generated by the training job from the selected path.
    Figure 3 Import Model
  3. On the Models page, when the status of the created model is Normal, the model is successfully imported. Choose Deploy > Real-Time Services in the Operation column to deploy the model as a real-time service.
  4. On the Deploy page, set parameters by referring to Figure 4 and click Next.
    Figure 4 Deploy
  5. On the Confirm tab page, check the configurations and click Submit to create a real-time service.
  6. After the real-time service is created, the Service Deployment > Real-Time Services page is displayed. The service deployment takes some time. When the service status changes to Running, the service is successfully deployed.

Performing Prediction

After the model to be deployed is executed, verify that the published prediction service is normal.

  1. Choose Service Deployment > Real-Time Services, and click the service name to go to the details page.
  2. On the Prediction tab page, enter the prediction code and click Predict. See Figure 5. In the Response area on the right, view the prediction result.

    The prediction request code is as follows:

    {
    	"data": {
    		"req_data": [
    			{
    			"sepal-length": 5.1,
    			"sepal-width": 3.5,
    			"petal-length": 1.4,
    			"petal-width": 0.2
    			}
    		]
    	}
    }
    Figure 5 Testing the server
  3. On the Usage Guides tab page, obtain the API to be called and use the Postman tool to perform the test.