Updated on 2025-01-22 GMT+08:00

Job Management

The MRS job management function provides an entry for you to submit jobs in a cluster, including MapReduce, Spark, HQL, and SparkSQL jobs.

MRS works with Huawei Cloud DataArts Studio to provide a one-stop big data collaboration development environment and fully-managed big data scheduling capabilities, helping you effortlessly build big data processing centers.

DataArts Studio allows you to develop and debug MRS HQL/SparkSQL scripts online and develop MRS jobs by performing drag-and-drop operations to migrate and integrate data between MRS and over 20 heterogeneous data sources. Powerful job scheduling and flexible monitoring and alarming help you easily manage data and job O&M.

You can create the following types of jobs on the console in an MRS cluster:

  • MapReduce can quickly process large-scale data in parallel. It is a distributed data processing model and execution environment. MRS supports the submission of MapReduce JAR programs.
  • Spark is a distributed in-memory computing framework. MRS supports SparkSubmit, Spark Script, and Spark SQL jobs.
    • SparkSubmit: You can submit Spark JAR and Spark Python programs, execute the Spark Application, and compute and process user data.
    • SparkScript: You can submit SparkScript scripts and batch execute Spark SQL statements.
    • Spark SQL: You can use Spark SQL statements (similar to SQL statements) to query and analyze user data in real time.
  • Hive is an open-source data warehouse based on Hadoop. MRS allows you to submit HiveScript scripts and directly execute Hive SQL statements.
  • Flink is a distributed big data processing engine that can perform stateful computations over both unbounded and bounded data streams.
  • HadoopStreaming works similarly to a standard Hadoop job, where you can define the input and output HDFS paths, as well as the mapper and reducer executable programs.