Updated on 2024-09-23 GMT+08:00

MRS Job Types

Category

An MRS job is the program execution platform of MRS. It is used to process and analyze user data. You cancreate jobs online using the MRS console or submit jobs in the background through the cluster client.

MRS jobs typically process data from OBS or HDFS. To create a job, you must first upload the data to be analyzed to OBS. MRS utilizes the data stored in OBS for computing and analysis.

MRS allows exporting data from OBS to HDFS for computing and analyzing. After data analysis and computing is complete, you can store the data in the HDFS or export it to OBS. HDFS and OBS can store compressed data in bz2 and gz formats.

You can create the following types of jobs online in an MRS cluster:

  • MapReduce can quickly process large-scale data in parallel. It is a distributed data processing model and execution environment. MRS supports the submission of MapReduce JAR programs.
  • Spark is a distributed in-memory computing framework. MRS supports SparkSubmit, Spark Script, and Spark SQL jobs.
    • SparkSubmit: You can submit Spark JAR and Spark Python programs, execute the Spark Application, and compute and process user data.
    • SparkScript: You can submit SparkScript scripts and batch execute Spark SQL statements.
    • Spark SQL: You can use Spark SQL statements (similar to SQL statements) to query and analyze user data in real time.
  • Hive is an open-source data warehouse based on Hadoop. MRS allows you to submit HiveScript scripts and directly execute Hive SQL statements.
  • Flink is a distributed big data processing engine that can perform stateful computations over both unbounded and bounded data streams.
  • HadoopStreaming works similarly to a standard Hadoop job, where you can define the input and output HDFS paths, as well as the mapper and reducer executable programs.

Job Execution Permission Description

For a security cluster with Kerberos authentication enabled, a user needs to synchronize an IAM user before submitting a job on the MRS web UI. After the synchronization is completed, the MRS system generates a user with the same IAM username. Whether a user has the permission to submit jobs depends on the IAM policy bound to the user during IAM synchronization. For details about the job submission policy, see Table 1 in Synchronizing IAM Users to MRS.

When a user submits a job that involves the resource usage of a specific component, such as accessing HDFS directories and Hive tables, user admin (Manager administrator) must grant the relevant permission to the user.

  1. Log in to Manager of the cluster as user admin.
  2. Add the role of the component whose permission is required by the user. For details, see Managing MRS Cluster Roles.
  3. Change the user group to which the user who submits the job belongs and add the new component role to the user group. For details, see Managing MRS Cluster User Groups.

    After the component role bound to the user group to which the user belongs is modified, it takes some time for the role permissions to take effect.