Updated on 2022-11-18 GMT+08:00

REST API

Function Description

The Spark REST API presents some web UI metrics in the JSON format, providing users with a simpler method to create new visualization and monitoring tools. The REST API can be used to query information about running and historical applications. The open-source Spark REST API allows users to query information about Jobs, Stages, Storage, Environment, and Executors. In the FusionInsight version, the REST API used to query SQL, JDBC/ODBC server, and Streaming information is added. For more information about the open-source REST API, see https://spark.apache.org/docs/3.1.1/monitoring.html#rest-api.

Preparing Running Environment

Install the FusionInsight client. Install a FusionInsight client on the node. For example, install the client in the /opt/client directory.

REST API

You can use the following commands to dodge the REST API filter and directly obtain the application information:

In normal mode, the JobHistory supports only the HTTP protocol. Therefore, use the HTTP protocol in the URL of the following command.

  • Obtaining information about all applications on the JobHistory node:
    • Command:
      curl http://192.168.227.16:4040/api/v1/applications?mode=monitoring --insecure

      "192.168.227.16" indicates the service IP address of the JobHistory node; "4040" indicates the port number of the JobHistory node.

    • Command output:
      [ {
        "id" : "application_1517290848707_0008",
        "name" : "Spark Pi",
        "attempts" : [ {
          "startTime" : "2018-01-30T15:05:37.433CST",
          "endTime" : "2018-01-30T15:06:04.625CST",
          "lastUpdated" : "2018-01-30T15:06:04.848CST",
          "duration" : 27192,
          "sparkUser" : "sparkuser",
          "completed" : true,
          "startTimeEpoch" : 1517295937433,
          "endTimeEpoch" : 1517295964625,
          "lastUpdatedEpoch" : 1517295964848
        } ]
      }, {
        "
      id" : "application_1517290848707_0145",
        "name" : "Spark shell",
        "attempts" : [ {
          "startTime" : "2018-01-31T15:20:31.286CST",
          "endTime" : "1970-01-01T07:59:59.999CST",
          "lastUpdated" : "2018-01-31T15:20:47.086CST",
          "duration" : 0,
          "sparkUser" : "admintest",
          "completed" : false,
          "startTimeEpoch" : 1517383231286,
          "endTimeEpoch" : -1,
          "lastUpdatedEpoch" : 1517383247086
        } ]
      }]
    • Analysis:
      After running this command, you can query information about all Spark applications in the current cluster. Table 1 describes the parameters in response to this command.
      Table 1 Parameter description

      Parameter

      Description

      id

      Application ID.

      name

      Application name.

      attempts

      Attempts executed by the application, including the attempt start time, attempt end time, user who initiates the attempts, and status indicating whether the attempts are completed.

  • Obtaining information about a specific application on the JobHistory node:
    • Command:
      curl http://192.168.227.16:4040/api/v1/applications/application_1517290848707_0008?mode=monitoring --insecure

      "192.168.227.16" indicates the service IP address of the JobHistory node; "4040" indicates the port number of the JobHistory node; "pplication_1517290848707_0008" indicates the application ID.

    • Command output:
      {
        "id" : "application_1517290848707_0008",
        "name" : "Spark Pi",
        "attempts" : [ {
          "startTime" : "2018-01-30T15:05:37.433CST",
          "endTime" : "2018-01-30T15:06:04.625CST",
          "lastUpdated" : "2018-01-30T15:06:04.848CST",
          "duration" : 27192,
          "sparkUser" : "sparkuser",
          "completed" : true,
          "startTimeEpoch" : 1517295937433,
          "endTimeEpoch" : 1517295964625,
          "lastUpdatedEpoch" : 1517295964848
        } ]
      }
    • Analysis:

      After running this command, you can query the information about a Spark application. For the description of parameters in response to this command, see Table 1.

  • Obtain the information about the Executor of a running application:
    • Command of alive executors list:
      curl http://192.168.169.84:8088/proxy/application_1478570725074_0046/api/v1/applications/application_1478570725074_0046/executors?mode=monitoring --insecure
    • Command of all executors(alive&dead) list:
      curl http://192.168.169.84:8088/proxy/application_1478570725074_0046/api/v1/applications/application_1478570725074_0046/allexecutors?mode=monitoring --insecure

      "192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8088" indicates the port number of the ResourceManager; "application_1478570725074_0046" indicates the application ID in YARN.

    • Command output:
      [{
        "id" : "driver",
        "hostPort" : "192.168.169.84:23886",
        "isActive" : true,
        "rddBlocks" : 0,
        "memoryUsed" : 0,
        "diskUsed" : 0,
        "activeTasks" : 0,
        "failedTasks" : 0,
        "completedTasks" : 0,
        "totalTasks" : 0,
        "totalDuration" : 0,
        "totalInputBytes" : 0,
        "totalShuffleRead" : 0,
        "totalShuffleWrite" : 0,
        "maxMemory" : 278019440,
        "executorLogs" : { }
      }, {
        "id" : "1",
        "hostPort" : "192.168.169.84:23902",
        "isActive" : true,
        "rddBlocks" : 0,
        "memoryUsed" : 0,
        "diskUsed" : 0,
        "totalCores" : 1,
        "maxTasks" : 1,
        "activeTasks" : 0,
        "failedTasks" : 0,
        "completedTasks" : 0,
        "totalTasks" : 0,
        "totalDuration" : 0,
        "totalGCTime" : 139,
        "totalInputBytes" : 0,
        "totalShuffleRead" : 0,
        "totalShuffleWrite" : 0,
        "maxMemory" : 555755765,
        "executorLogs" : {
          "stdout" : "https://XTJ-224:26010/node/containerlogs/container_1478570725074_0049_01_000002/admin/stdout?start=-4096",
          "stderr" : "https://XTJ-224:26010/node/containerlogs/container_1478570725074_0049_01_000002/admin/stderr?start=-4096"
        }
      } ]
    • Analysis:
      After running this command, you can query information about all Executors (including the driver) of the current application. Table 2 describes the parameters in response to this command.
      Table 2 Parameter description

      Parameter

      Description

      id

      Executor ID.

      hostPort

      IP address and port number of the node where the Executor resides. Format: IP address:port number.

      executorLogs

      Path to Executor logs.

Enhanced REST API

  • SQL related: obtaining all the SQL statements and those with the longest execution time.
    • SparkUI command:
      curl http://192.168.195.232:8088/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/SQL?mode=monitoring --insecure

      "192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8088" indicates the port number of the ResourceManager; "application_1476947670799_0053" indicates the application ID in YARN; .

    • JobHistory command:
      curl http://192.168.227.16:4040/api/v1/applications/application_1478570725074_0004/SQL?mode=monitoring --insecure

      "192.168.227.16" indicates the service IP address of the JobHistory node; "4040" indicates the port number of the JobHistory node; "application_1478570725074_0004" indicates the application ID.

    • Command output:

      The command output of the SparkUI and JobHistory commands is as follows:

      {
        "longestDurationOfCompletedSQL" : [ {
          "id" : 0,
          "status" : "COMPLETED",
          "description" : "getCallSite at SQLExecution.scala:48",
          "submissionTime" : "2016/11/08 15:39:00",
          "duration" : "2 s",
          "runningJobs" : [ ],
          "successedJobs" : [ 0 ],
          "failedJobs" : [ ]
        } ],
        "sqls" : [ {
          "id" : 0,
          "status" : "COMPLETED",
          "description" : "getCallSite at SQLExecution.scala:48",
          "submissionTime" : "2016/11/08 15:39:00",
          "duration" : "2 s",
          "runningJobs" : [ ],
          "successedJobs" : [ 0 ],
          "failedJobs" : [ ]
        }]
      }
    • Analysis:
      After running this command, you can obtain all the SQL statements executed by the current application (the sqls part of the command output) and the SQL statements with the longest execution time (the longestDurationOfCompletedSQL part of the command output). Table 3 describes the parameters in response to this command.
      Table 3 Parameter description

      Parameter

      Description

      id

      SQL statement ID.

      status

      Execution status of the SQL statement, which can be: running, completed, and failed.

      runningJobs

      Jobs that are being executed generated by the SQL statement.

      successedJobs

      Jobs that are successfully executed generated by the SQL statement.

      failedJobs

      Job that fails to be executed generated by the SQL statement.

  • JDBC server related: obtaining the number of sessions, number of being-executed SQL statements, information about all sessions, and information about SQL statements.
    • Command:
      curl http://192.168.195.232:8088/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/sqlserver?mode=monitoring --insecure

      "192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8088" indicates the port number of the ResourceManager; "application_1476947670799_0053" indicates the application ID in YARN.

    • Command output:
      {
        "sessionNum" : 1,
        "runningSqlNum" : 0,
        "sessions" : [ {
          "user" : "spark",
          "ip" : "192.168.169.84",
          "sessionId" : "9dfec575-48b4-4187-876a-71711d3d7a97",
          "startTime" : "2016/10/29 15:21:10",
          "finishTime" : "",
          "duration" : "1 minute 50 seconds",
          "totalExecute" : 1
        } ],
        "sqls" : [ {
          "user" : "spark",
          "jobId" : [ ],
          "groupId" : "e49ff81a-230f-4892-a209-a48abea2d969",
          "startTime" : "2016/10/29 15:21:13",
          "finishTime" : "2016/10/29 15:21:14",
          "duration" : "555 ms",
          "statement" : "show tables",
          "state" : "FINISHED",
          "detail" : "== Parsed Logical Plan ==\nShowTablesCommand None\n\n== Analyzed Logical Plan ==\ntableName: string, isTemporary: boolean\nShowTablesCommand None\n\n== Cached Logical Plan ==\nShowTablesCommand None\n\n== Optimized Logical Plan ==\nShowTablesCommand None\n\n== Physical Plan ==\nExecutedCommand ShowTablesCommand None\n\nCode Generation: true"
        } ]
      }
    • Analysis:
      After running this command, you can query the number of sessions in the current JDBC application, number of being-executed SQL statements, and information about all sessions and SQL statements. Table 4 describes the parameters in the queried session information; Table 5 describes the parameters in the queried SQL statement information.
      Table 4 Session parameter description

      Parameter

      Description

      user

      User to whom the session connects.

      ip

      IP address of the node where the session resides.

      sessionId

      Session ID.

      startTime

      Time when the session starts the connection.

      finishTime

      Time when the session ends the connection.

      duration

      Connection duration of the session.

      totalExecute

      Number of SQL statements executed by the session.

      Table 5 SQL parameter description

      Parameter

      Description

      user

      User who executes the SQL statement.

      jobId

      IDs of jobs contained in the SQL statement.

      groupId

      ID of the group where the SQL statement resides.

      startTime

      Start time.

      finishTime

      End time.

      duration

      SQL statement execution duration.

      statement

      SQL statement.

      detail

      Logical/Physical plan.

  • Streaming related: obtaining the average input frequency, average scheduling delay, average execution duration, and average value of the overall delay.
    • Command:
      curl http://192.168.195.232:8088/proxy/application_1477722033672_0008/api/v1/applications/application_1477722033672_0008/streaming/statistics?mode=monitoring --insecure

      "192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8088" indicates the port number of the ResourceManager; "application_1477722033672_0008" indicates the application ID in YARN.

    • Command:
      {
      "startTime" : "2018-12-25T08:58:10.836GMT",  
      "batchDuration" : 1000,  
      "numReceivers" : 1,  
      "numActiveReceivers" : 1,  
      "numInactiveReceivers" : 0,  
      "numTotalCompletedBatches" : 373,  
      "numRetainedCompletedBatches" : 373,  
      "numActiveBatches" : 0,  
      "numProcessedRecords" : 1,  
      "numReceivedRecords" : 1,  
      "avgInputRate" : 0.002680965147453083,  
      "avgSchedulingDelay" : 14,  
      "avgProcessingTime" : 47,  
      "avgTotalDelay" : 62
      }
    • Analysis:

      After running this command, you can query the average input frequency (unit: events/sec), average scheduling delay (unit: ms), average execution time (unit: ms), and average value of the total delay (unit: ms) of the current Streaming application.