Updated on 2022-07-11 GMT+08:00

REST API

Function Description

Use the HTTP REST API to view more information about MapReduce tasks. Currently, the REST API of MapResuce can be used to query the status of completed tasks. For details about the API, see the official website:

http://hadoop.apache.org/docs/r3.1.1/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html

Preparing the Running Environment

  1. Install the client, for example, to the /opt/client directory on the node. For details, see section "Installing a Client."
  2. Go the client installation directory and run the following commands to configure the environment variables:

    source bigdata_env

    kinit service user

    The validity duration of kinit authentication is 24 hours. If you run the sample again 24 hours later, you need to run the kinit command again.

  3. HTTPS-based access is different from HTTP-based access. When you access MapReduce using HTTPS, you must ensure that the SSL protocol supported by the curl command is supported by the cluster because SSL security encryption is used. If the cluster does not support the SSL protocol, change the SSL protocol in the cluster. For example, if the cURL supports only the TLSv1 protocol, perform the following steps:

    Log in to FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Yarn > Configurations > All Configurations, search for hadoop.ssl.enabled.protocols in the search box, and check whether the parameter value contains TLSv1. If the parameter value does not contain TLSv1, add TLSv1 in the hadoop.ssl.enabled.protocols configuration item. Clear the value of ssl.server.exclude.cipher.list. Otherwise, you cannot access Yarn using HTTPS. Click Save, and click More > Restart Service to restart the service.

    • The values of MapReduce configuration items hadoop.ssl.enabled.protocols and ssl.server.exclude.cipher.list directly reference the values of the corresponding configuration items in Yarn. Therefore, you need to change the values of the corresponding configuration items in Yarn and restart the Yarn and MapReduce services.
    • TLSv1 has security vulnerabilities. Exercise caution when using it.

Procedure

Obtain detailed information about tasks that have been completed on MapReduce.

  • Commands for the operation:
    curl -k -i --negotiate -u : "https://10.120.85.2:26014/ws/v1/history/mapreduce/jobs"

    In the preceding command, 10.120.85.2 indicates the value of JHS_FLOAT_IP for MapReduce, and 26014 indicates the port ID of the JobHistoryServer node.

    In RedHat 6.x and CentOS 6.x, a compatibility problem occurs when the curl command is used to access the JobHistoryServer. As a result, the correct result cannot be returned.

  • You can view the status information about historical tasks, such as the task IDs, start time, end time, and task execution status.
  • Execution result
    {
        "jobs":{
            "job":[
                {
                    "submitTime":1525693184360,
                    "startTime":1525693194840,
                    "finishTime":1525693215540,
                    "id":"job_1525686535456_0001",
                    "name":"QuasiMonteCarlo",
                    "queue":"default",
                    "user":"mapred",
                    "state":"SUCCEEDED",
                    "mapsTotal":1,
                    "mapsCompleted":1,
                    "reducesTotal":1,
                    "reducesCompleted":1
                }
            ]
        }
    }
  • Result analysis:

    Using this API, you can query the completed MapReduce tasks in the current cluster and obtain information listed in Table 1.

    Table 1 Common information

    Parameter

    Description

    submitTime

    Time when a task is submitted

    startTime

    Start time

    finishTime

    End time

    queue

    Task queue

    user

    User who submits the task

    state

    Task state, success or failure