REST API
Function Description
The Spark REST API presents some web UI metrics in the JSON format, providing users with a simpler method to create new visualization and monitoring tools. The REST API can be used to query information about running and historical applications. The open-source Spark REST API allows users to query information about Jobs, Stages, Storage, Environment, and Executors. In the FusionInsight version, the REST API used to query SQL, JDBC server, and Streaming information is added. For more information about the open-source REST API, see https://spark.apache.org/docs/3.1.1/monitoring.html#rest-api.
Preparing Running Environment
Install the FusionInsight client. Install a FusionInsight client on the node. For example, install the client in the /opt/client directory.
REST API
You can use the following commands to dodge the REST API filter and directly obtain the application information:
- In security mode, the JobHistory supports only the HTTPS protocol. Therefore, use the HTTPS protocol in the URL of the following command.
- In security mode, you need to set spark.ui.customErrorPage=false and restart Spark2x (Change the value of this parameter for the JobHistory2x, JDBCServer2x, and SparkResource2x instances.) .
HTTPS-based access is different from HTTP-based access. When you access JobHistory of Spark2x using HTTPS, ensure that the SSL protocol supported by the curl command is supported by the cluster because SSL security encryption is used. If the cluster does not support the SSL protocol, use either of the following methods:
- Modify the SSL protocol configured for the cluster. For example, if the curl command supports only the TLSv1 protocol (TLSv1 has security vulnerabilities and must be used with caution), perform the following steps:
- Log in to FusionInsight Manager and choose Cluster > Name of the desired cluster > Services > Spark2x > Configurations > All Configurations.
- Search for ssl in the search box. Check whether the value of spark.ssl.historyServer.protocol for JobHistory contains TLSv1. If it does not, add TLSv1 to the value.
- Clear the value of the spark.ssl.historyServer.enabledAlgorithms parameter for JobHistory.
- Click Save Configuration and then click OK. Restart the Spark2x service or JobHistory instance.
- Perform the following steps to upgrade the curl version on the node:
- Download the curl installation package from the following website: http://curl.haxx.se/download/
- Run the following command to decompress the installation package:
- Run the following command to overwrite the old curl version with the new one:
./configure
make
make install
- Run the following command to update the dynamic link library of curl:
- After the installation is successful, log in to the node again and run the following command to check whether the curl version is successfully updated:
- Obtaining information about all applications on the JobHistory node:
- Command:
curl -k -i --negotiate -u: "https://192.168.227.16:4040/api/v1/applications"
"192.168.227.16" indicates the service IP address of the JobHistory node; "4040" indicates the port number of the JobHistory node.
- Command output:
[ { "id" : "application_1517290848707_0008", "name" : "Spark Pi", "attempts" : [ { "startTime" : "2018-01-30T15:05:37.433CST", "endTime" : "2018-01-30T15:06:04.625CST", "lastUpdated" : "2018-01-30T15:06:04.848CST", "duration" : 27192, "sparkUser" : "sparkuser", "completed" : true, "startTimeEpoch" : 1517295937433, "endTimeEpoch" : 1517295964625, "lastUpdatedEpoch" : 1517295964848 } ] }, { " id" : "application_1517290848707_0145", "name" : "Spark shell", "attempts" : [ { "startTime" : "2018-01-31T15:20:31.286CST", "endTime" : "1970-01-01T07:59:59.999CST", "lastUpdated" : "2018-01-31T15:20:47.086CST", "duration" : 0, "sparkUser" : "admintest", "completed" : false, "startTimeEpoch" : 1517383231286, "endTimeEpoch" : -1, "lastUpdatedEpoch" : 1517383247086 } ] }]
- Analysis:
After running this command, you can query information about all Spark applications in the current cluster. Table 1 describes the parameters in response to this command.
- Command:
- Obtaining information about a specific application on the JobHistory node:
- Command:
curl -k -i --negotiate -u: "https://192.168.227.16:4040/api/v1/applications/application_1517290848707_0008"
"192.168.227.16" indicates the service IP address of the JobHistory node; "4040" indicates the port number of the JobHistory node; "application_1517290848707_0008" indicates the application ID.
- Command output:
{ "id" : "application_1517290848707_0008", "name" : "Spark Pi", "attempts" : [ { "startTime" : "2018-01-30T15:05:37.433CST", "endTime" : "2018-01-30T15:06:04.625CST", "lastUpdated" : "2018-01-30T15:06:04.848CST", "duration" : 27192, "sparkUser" : "sparkuser", "completed" : true, "startTimeEpoch" : 1517295937433, "endTimeEpoch" : 1517295964625, "lastUpdatedEpoch" : 1517295964848 } ] }
- Analysis:
After running this command, you can query the information about a Spark application. For the description of parameters in response to this command, see Table 1.
- Command:
- Obtain the information about the executor of a running application:
- Command of alive executors list:
curl -k -i --negotiate -u: "https://192.168.169.84:8090/proxy/application_1478570725074_0046/api/v1/applications/application_1478570725074_0046/executors"
- Command of all executors (alive and dead) list:
curl -k -i --negotiate -u: "https://192.168.169.84:8090/proxy/application_1478570725074_0046/api/v1/applications/application_1478570725074_0046/allexecutors"
"192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8090" indicates the port number of the ResourceManager; "application_1478570725074_0046" indicates the application ID in YARN.
- Command output:
[{ "id" : "driver", "hostPort" : "192.168.169.84:23886", "isActive" : true, "rddBlocks" : 0, "memoryUsed" : 0, "diskUsed" : 0, "activeTasks" : 0, "failedTasks" : 0, "completedTasks" : 0, "totalTasks" : 0, "totalDuration" : 0, "totalInputBytes" : 0, "totalShuffleRead" : 0, "totalShuffleWrite" : 0, "maxMemory" : 278019440, "executorLogs" : { } }, { "id" : "1", "hostPort" : "192.168.169.84:23902", "isActive" : true, "rddBlocks" : 0, "memoryUsed" : 0, "diskUsed" : 0, "totalCores" : 1, "maxTasks" : 1, "activeTasks" : 0, "failedTasks" : 0, "completedTasks" : 0, "totalTasks" : 0, "totalDuration" : 0, "totalGCTime" : 139, "totalInputBytes" : 0, "totalShuffleRead" : 0, "totalShuffleWrite" : 0, "maxMemory" : 555755765, "executorLogs" : { "stdout" : "https://XTJ-224:26010/node/containerlogs/container_1478570725074_0049_01_000002/admin/stdout?start=-4096", "stderr" : "https://XTJ-224:26010/node/containerlogs/container_1478570725074_0049_01_000002/admin/stderr?start=-4096" } } ]
- Analysis:
After running this command, you can query information about all Executors (including the driver) of the current application. Table 2 describes the parameters in response to this command.
- Command of alive executors list:
Enhanced REST API
- SQL related: obtaining all the SQL statements and those with the longest execution time.
- SparkUI command:
curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/SQL"
"192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8090" indicates the port number of the ResourceManager; "application_1476947670799_0053" indicates the application ID in YARN.
You can add parameters to the URL after the command to search for the corresponding SQL statements.
For example, run the following command to view 100 SQL statements:
curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/SQL?limit=100"
Run the following command to view running parameters:
curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/SQL?completed=false"
- JobHistory command:
curl -k -i --negotiate -u: "https://192.168.227.16:4040/api/v1/applications/application_1478570725074_0004/SQL"
"192.168.227.16" indicates the service IP address of the JobHistory node; "4040" indicates the port number of the JobHistory node; "application_1478570725074_0004" indicates the application ID.
- Command output:
The command output of the SparkUI and JobHistory commands is as follows:
{ "longestDurationOfCompletedSQL" : [ { "id" : 0, "status" : "COMPLETED", "description" : "getCallSite at SQLExecution.scala:48", "submissionTime" : "2016/11/08 15:39:00", "duration" : "2 s", "runningJobs" : [ ], "successedJobs" : [ 0 ], "failedJobs" : [ ] } ], "sqls" : [ { "id" : 0, "status" : "COMPLETED", "description" : "getCallSite at SQLExecution.scala:48", "submissionTime" : "2016/11/08 15:39:00", "duration" : "2 s", "runningJobs" : [ ], "successedJobs" : [ 0 ], "failedJobs" : [ ] }] }
- Analysis:
After running this command, you can obtain all the SQL statements executed by the current application (the sqls part of the command output) and the SQL statements with the longest execution time (the longestDurationOfCompletedSQL part of the command output). Table 3 describes the parameters in response to this command.
Table 3 Parameter description Parameter
Description
id
SQL statement ID.
status
Execution status of the SQL statement, which can be: running, completed, and failed.
runningJobs
Jobs that are being executed generated by the SQL statement.
successedJobs
Jobs that are successfully executed generated by the SQL statement.
failedJobs
Job that fails to be executed generated by the SQL statement.
- SparkUI command:
- JDBC server related: obtaining the number of sessions, number of being-executed SQL statements, information about all sessions, and information about SQL statements.
- Command:
curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/sqlserver"
"192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8090" indicates the port number of the ResourceManager; "application_1476947670799_0053" indicates the application ID in YARN.
- Command output:
{ "sessionNum" : 1, "runningSqlNum" : 0, "sessions" : [ { "user" : "spark", "ip" : "192.168.169.84", "sessionId" : "9dfec575-48b4-4187-876a-71711d3d7a97", "startTime" : "2016/10/29 15:21:10", "finishTime" : "", "duration" : "1 minute 50 seconds", "totalExecute" : 1 } ], "sqls" : [ { "user" : "spark", "jobId" : [ ], "groupId" : "e49ff81a-230f-4892-a209-a48abea2d969", "startTime" : "2016/10/29 15:21:13", "finishTime" : "2016/10/29 15:21:14", "duration" : "555 ms", "statement" : "show tables", "state" : "FINISHED", "detail" : "== Parsed Logical Plan ==\nShowTablesCommand None\n\n== Analyzed Logical Plan ==\ntableName: string, isTemporary: boolean\nShowTablesCommand None\n\n== Cached Logical Plan ==\nShowTablesCommand None\n\n== Optimized Logical Plan ==\nShowTablesCommand None\n\n== Physical Plan ==\nExecutedCommand ShowTablesCommand None\n\nCode Generation: true" } ] }
- Analysis:
After running this command, you can query the number of sessions in the current JDBC application, number of being-executed SQL statements, and information about all sessions and SQL statements. Table 4 describes the parameters in the queried session information; Table 5 describes the parameters in the queried SQL statement information.
Table 4 Session parameter description Parameter
Description
user
User to whom the session connects.
ip
IP address of the node where the session resides.
sessionId
Session ID.
startTime
Time when the session starts the connection.
finishTime
Time when the session ends the connection.
duration
Connection duration of the session.
totalExecute
Number of SQL statements executed by the session.
Table 5 SQL parameter description Parameter
Description
user
User who executes the SQL statement.
jobId
IDs of jobs contained in the SQL statement.
groupId
ID of the group where the SQL statement resides.
startTime
Start time.
finishTime
End time.
duration
SQL statement execution duration.
statement
SQL statement.
detail
Logical/Physical plan.
- Command:
- JDBC API enhancement cancels the SQL statement that is being executed by using the execution ID obtained from the beeline.
- Commands for the operation:
curl -k -i --negotiate -X PUT -u: "https://192.168.195.232:8090/proxy/application_1477722033672_0008/api/v1/applications/application_1477722033672_0008/cancel/execution?executionId=8"
- Command output:
- Remarks:
Run the SQL statement in spark-beeline. If the SQL statement generates a Spark task, the execution ID of the SQL statement will be printed in beeline. To cancel the execution of the SQL statement, run the preceding command.
- Commands for the operation:
- Streaming related: obtaining the average input frequency, average scheduling delay, average execution duration, and average value of the overall delay.
- Command:
curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1477722033672_0008/api/v1/applications/application_1477722033672_0008/streaming/statistics"
"192.168.195.232" indicates the service IP address of the master node of the ResourceManager; "8090" indicates the port number of the ResourceManager; "application_1477722033672_0008" indicates the application ID in YARN.
- Command:
{ "startTime" : "2018-12-25T08:58:10.836GMT", "batchDuration" : 1000, "numReceivers" : 1, "numActiveReceivers" : 1, "numInactiveReceivers" : 0, "numTotalCompletedBatches" : 373, "numRetainedCompletedBatches" : 373, "numActiveBatches" : 0, "numProcessedRecords" : 1, "numReceivedRecords" : 1, "avgInputRate" : 0.002680965147453083, "avgSchedulingDelay" : 14, "avgProcessingTime" : 47, "avgTotalDelay" : 62 }
- Analysis:
After running this command, you can query the average input frequency (unit: events/sec), average scheduling delay (unit: ms), average execution time (unit: ms), and average value of the total delay (unit: ms) of the current Streaming application.
- Command:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.