Help Center> MapReduce Service> Best Practices> Data Analytics> Using Spark2x to Analyze IoV Drivers' Driving Behavior
Updated on 2023-11-17 GMT+08:00

Using Spark2x to Analyze IoV Drivers' Driving Behavior

The best practices for Huawei Cloud MapReduce Service (MRS) guides you through the basic functions of MRS. This case shows you how to use the Spark2x component of MRS to analyze and collect statistics on driver behaviors and obtain the analysis results.

This practice applies only to MRS 3.1.0. Create a cluster as instructed.

You can get started by reading the following topics:

  1. Scenario
  2. Creating a Cluster
  3. Preparing a Spark2x Sample Program and Sample Data
  4. Creating a Job
  5. Viewing the Job Execution Results

Scenario

In this case, raw data is driver behavior information, including abrupt acceleration, abrupt deceleration, neutral sliding, overspeed, and fatigue driving. With the powerful analysis capability of the Spark2x component, you can analyze the driver behavior information of a specified period and obtain result statistics on the information.

Creating a Cluster

  1. Go to the Buy Cluster page.
  2. Click the Custom Config tab.

    Configure cluster software information according to Table 1.
    Table 1 Software configurations

    Parameter

    Configuration

    Region

    CN-Hong Kong

    NOTE:

    This document uses CN-Hong Kong as an example. If you want to perform operations in other regions, ensure that all operations are performed in the same region.

    Billing Mode

    Pay-per-use

    Cluster Name

    mrs_demo

    Cluster Type

    Analysis cluster (for offline data analysis)

    Version Type

    Normal

    Cluster Version

    MRS 3.1.0

    NOTE:

    This practice applies only to MRS 3.1.0.

    Component

    All components

    Metadata

    Local

    Figure 1 Software configurations

  3. Click Next to configure hardware.

    Configure cluster hardware information according to Table 2.
    Table 2 Hardware configurations

    Parameter

    Configuration

    AZ

    AZ2

    Enterprise Project

    default

    VPC

    Select the VPC for which you want to create a cluster and click View VPC to view the name and ID of the VPC. If no VPC is available, create one.

    Subnet

    Select the subnet for which you want to create a cluster to enter the VPC and view the name and ID of the subnet. If no subnet is created under the VPC, click Create Subnet to create one.

    Security Group

    Auto create

    EIP

    Bind later

    Cluster Node

    Default settings

    Figure 2 Hardware configurations

  4. Click Next. On the Set Advanced Options page, set the following parameters by referring to Table 3 and retain the default settings for other parameters.

    Table 3 Advanced configurations

    Parameter

    Configuration

    Kerberos Authentication

    Disable Kerberos authentication.

    Username

    Name of the administrator of MRS Manager. admin is used by default.

    Password

    Password of the MRS Manager administrator.

    Confirm Password

    Enter the password of the Manager administrator again.

    Login Mode

    Select Password.

    Username

    Name of the user for logging in to ECSs. root is used by default.

    Password

    Password for logging in to ECSs.

    Confirm Password

    Enter the password for logging in to ECSs again.

    Figure 3 Advanced configurations

  5. Click Next. On the Confirm Configuration page, check the cluster configuration information. If you need to adjust the configuration, click to go to the corresponding tab page and configure parameters again.
  6. Select Secure Communications and click Buy Now.
  7. Click Back to Cluster List to view the cluster status.

    Cluster creation takes some time. The initial status of the cluster is Starting. After the cluster has been created successfully, the cluster status becomes Running.

Preparing a Spark2x Sample Program and Sample Data

  1. Create an OBS parallel file system to store the Spark sample program, sample data, job execution results, and logs.

    1. Log in to the HUAWEI CLOUD management console.
    2. In the Service List, choose Storage > Object Storage Service.
    3. In the navigation pane on the left, choose Parallel File System and click Create Parallel File System to create a file system named obs-demo-analysis-hwt4. Retain the default values for parameters such as Policy.
      Figure 4 Creating a parallel file system

  2. Click the name of the file system. In the navigation pane on the left, choose Files. On the displayed page, click Create Folder to create the program and input folders, as shown in Figure 5.

    Figure 5 Creating a folder

  3. Download the sample program driver_behavior.jar from https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/driver_behavior.jar to the local PC.
  4. Go to the program folder. Click Upload File and select the local driver_behavior.jar sample program.
  5. Click Upload to upload the sample program to the OBS bucket.
  6. Obtain Spark sample data from https://mrs-obs-ap-southeast-1.obs.ap-southeast-1.myhuaweicloud.com/mrs-demon-samples/demon/detail-records.zip.
  7. Decompress the downloaded detail-records.zip package to obtain the sample data.

    Figure 6 Sample data

  8. Go to the input folder. Click Upload File and select the local Spark sample data. Click Upload to upload the sample data to the OBS bucket.

    Upload the data decompressed in 7 to the input folder.

Creating a Job

  1. In the left navigation pane of the MRS console, choose Clusters > Active Clusters. On the displayed page, click the mrs_demo cluster.
  2. Click the Jobs tab and then Create to create a job.

    Figure 7 Creating a job

  3. Set job parameters by referring to Figure 8.

    Table 4 Configuring job parameters

    Parameter

    Configuration

    Type

    Select SparkSubmit.

    Name

    Enter driver_behavior_task.

    Program Path

    Click OBS and select the driver_behavior.jar package uploaded in Preparing a Spark2x Sample Program and Sample Data.

    Program Parameter

    Select --class in Parameter, and enter com.huawei.bigdata.spark.examples.DriverBehavior in Value.

    Parameters

    Enter AK SK 1 Input path Output path.

    • For details about how to obtain the AK/SK, see the steps described in NOTE.
    • 1 is a fixed input that is used to specify the program function invoked during job execution.
    • Input path is the path you selected for the Program Path parameter.
    • Output path should be a directory that does not exist, for example, obs://obs-demo-analysis-hwt4/output/.
    NOTE:

    To obtain the AK/SK, perform the following steps:

    1. Log in to the Huawei Cloud management console.
    2. Click the username in the upper right corner and choose My Credentials.
    3. In the navigation pane on the left, choose Access Keys.
    4. Click Create Access Key to add a key. Enter the password and verification code as prompted. The browser automatically downloads the credentials.csv file. The file is in CSV format and separated by commas (,). In the file, the middle part is AK and the last part is SK.

    Service Parameter

    This parameter is left blank by default. Retain the default settings.

    Figure 8 Creating a job

  4. Click OK to start executing the program.

Viewing the Job Execution Results

  1. Go to the Jobs page to view the job execution status.

    Figure 9 Execution status

  2. Wait 1 to 2 minutes and log in to OBS console. Go to the output path of the obs-demo-analysis-hwt4 file system to view the execution result. Click Download in the Operation column of the generated CSV file to download the file to your local PC.

    Figure 10 Viewing the job execution results

  3. Open the downloaded CSV file using Excel and classify the data in each column according to the fields defined in the program. The job execution results are obtained.

    Figure 11 Execution result