Updated on 2024-09-23 GMT+08:00

Running a HadoopStreaming Job

MRS allows you to submit and run your own programs, and get the results. This section will show you how to submit a Hadoop Streaming job in an MRS cluster.

Prerequisites

  • You have uploaded the program packages and data files required by jobs to OBS or HDFS.
  • If the job program needs to read and analyze data in the OBS file system, you need to configure storage-compute decoupling for the MRS cluster. For details, see Configuring Storage-Compute Decoupling for an MRS Cluster.

Submitting a Hadoop Streaming job

  1. Log in to the MRS console.
  2. On the Active Clusters page, select a running cluster and click its name to switch to the cluster details page.
  3. In the Basic Information area of the Dashboard page, click Synchronize on the right side of IAM User Sync to synchronize IAM users.

    Perform this step only when Kerberos authentication is enabled for the cluster.

    • After the IAM user synchronization is complete, wait for 5 minutes before submitting a job. For details about IAM user synchronization, see Synchronizing IAM Users to MRS..
    • When the policy of the user group an IAM user belongs to changes from MRS ReadOnlyAccess to MRS CommonOperations, MRS FullAccess, or MRS Administrator, or vice versa, it takes time for the cluster node's System Security Services Daemon (SSSD) cache to refresh. To prevent job submission failure, wait for five minutes after user synchronization is complete before submitting the job with the new policy.
    • If the IAM username contains spaces (for example, admin 01), jobs cannot be added.

  4. Click Job Management. On the displayed job list page, click Create.
  5. Set Type to HadoopStreaming. Configure job information by referring to Table 1.

    Table 1 Job parameters

    Parameter

    Description

    Example

    Name

    Job name. It contains 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

    hadoop_job

    Program Parameter

    (Optional) Used to configure optimization parameters such as threads, memory, and vCPUs for the job to optimize resource usage and improve job execution performance.

    Table 2 describes the common parameters of a running program.

    -

    Parameters

    (Optional) Key parameter for program execution. The parameter is specified by the function of the custom program. MRS is only responsible for loading the parameters.

    Multiple parameters are separated by spaces. The value can contain a maximum of 150,000 characters and can be left blank. The value cannot contain special characters such as ;|&><'$

    CAUTION:

    When entering a parameter containing sensitive information (for example, login password), you can add an at sign (@) before the parameter name to encrypt the parameter value. This prevents the sensitive information from being persisted in plaintext.

    When you view job information on the MRS console, the sensitive information is displayed as *.

    Example: username=testuser @password=User password

    -

    Service Parameter

    (Optional) Service parameters for the job.

    To modify the current job, change this parameter. For permanent changes to the entire cluster, refer to Modifying the Configuration Parameters of an MRS Cluster Component and modify the cluster component parameters accordingly.

    Click on the right to add more parameters.

    If a job needs to access OBS using AK/SK, add the following service configuration parameters:

    • fs.obs.access.key: key ID for accessing OBS.
    • fs.obs.secret.key: key corresponding to the key ID for accessing OBS.

    -

    Command Reference

    Commands submitted to the background when the job is submitted.

    -

    Table 2 Program parameters

    Parameter

    Description

    Example Value

    -ytm

    Memory size of each TaskManager container. (Optional unit. The unit is MB by default.)

    1024

    -yjm

    Memory size of JobManager container. (Optional unit. The unit is MB by default.)

    1024

    -yn

    Number of Yarn containers allocated to applications. The value is the same as the number of TaskManagers.

    For MRS 3.x or later, the -yn parameter is not supported.

    2

    -ys

    Number of TaskManager cores

    2

    -ynm

    Custom name of an application on Yarn

    test

    -c

    Class of the program entry method (for example, the main or getPlan() method). This parameter is required only when the JAR file does not specify the class of its manifest.

    com.bigdata.mrs.test

  6. Confirm job configuration information and click OK.

    After the job is created, you can manage it.