El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Spark REST APIs

Updated on 2024-10-23 GMT+08:00

Function Description

The Spark REST API presents some web UI metrics in the JSON format, providing users with a simpler method to create new visualization and monitoring tools. The REST API can be used to query information about running and historical applications. The open-source Spark REST API allows users to query information about Jobs, Stages, Storage, Environment, and Executors. In the FusionInsight version, the REST API used to query SQL, JDBC server, and Streaming information is added. For more information about the open-source REST API, see https://archive.apache.org/dist/spark/docs/3.3.1/monitoring.html#rest-api.

Preparing the Running Environment

Install the FusionInsight client. Install a FusionInsight client on the node. For example, install the client in the /opt/client directory.

REST API

You can use the following commands to dodge the REST API filter and directly obtain the application information:

NOTICE:
  • In security mode, the JobHistory supports only the HTTPS protocol. Therefore, use the HTTPS protocol in the URL of the following command.
  • In security mode, you need to set spark.ui.customErrorPage to false and restart Spark2x. Change the value of this parameter for the JobHistory2x, JDBCServer2x, and SparkResource2x instances.
NOTE:
Perform the following steps to upgrade the curl version on the node:
  1. Download the curl installation package from the following website: http://curl.haxx.se/download/
  2. Run the following command to decompress the installation package:

    tar -xzvf curl-x.x.x.tar.gz

  3. Run the following commands to overwrite the old curl version with the new one:

    cd curl-x.x.x

    ./configure

    make

    make install

  4. Run the following command to update the dynamic link library of curl:

    ldconfig

  5. After the installation is successful, log in to the node again and run the following command to check whether the curl version is successfully updated:

    curl --version

  • Obtaining information about all applications on the JobHistory node:
    • Command:
      curl -k -i --negotiate -u: "https://192.168.227.16:18080/api/v1/applications"

      192.168.227.16 indicates the service IP address of the JobHistory node and 18080 indicates the port number of the JobHistory node.

    • Command output:
      [ {
        "id" : "application_1517290848707_0008",
        "name" : "Spark Pi",
        "attempts" : [ {
          "startTime" : "2018-01-30T15:05:37.433CST",
          "endTime" : "2018-01-30T15:06:04.625CST",
          "lastUpdated" : "2018-01-30T15:06:04.848CST",
          "duration" : 27192,
          "sparkUser" : "sparkuser",
          "completed" : true,
          "startTimeEpoch" : 1517295937433,
          "endTimeEpoch" : 1517295964625,
          "lastUpdatedEpoch" : 1517295964848
        } ]
      }, {
        "
      id" : "application_1517290848707_0145",
        "name" : "Spark shell",
        "attempts" : [ {
          "startTime" : "2018-01-31T15:20:31.286CST",
          "endTime" : "1970-01-01T07:59:59.999CST",
          "lastUpdated" : "2018-01-31T15:20:47.086CST",
          "duration" : 0,
          "sparkUser" : "admintest",
          "completed" : false,
          "startTimeEpoch" : 1517383231286,
          "endTimeEpoch" : -1,
          "lastUpdatedEpoch" : 1517383247086
        } ]
      }]
    • Analysis:
      With this command, you can query information about all Spark applications in the current cluster, including running applications and the completed applications. Table 1 provides information about each application.
      Table 1 Parameter description

      Parameter

      Description

      id

      Application ID.

      name

      Application name.

      attempts

      Attempts executed by the application, including the attempt start time, attempt end time, user who initiates the attempts, and status indicating whether the attempts are completed.

  • Obtaining information about a specific application on the JobHistory node:
    • Command:
      curl -k -i --negotiate -u: "https://192.168.227.16:18080/api/v1/applications/application_1517290848707_0008"

      192.168.227.16 indicates the service IP address of the JobHistory node, 18080 indicates the port number of the JobHistory node, and application_1517290848707_0008 indicates the application ID.

    • Command output:
      {
        "id" : "application_1517290848707_0008",
        "name" : "Spark Pi",
        "attempts" : [ {
          "startTime" : "2018-01-30T15:05:37.433CST",
          "endTime" : "2018-01-30T15:06:04.625CST",
          "lastUpdated" : "2018-01-30T15:06:04.848CST",
          "duration" : 27192,
          "sparkUser" : "sparkuser",
          "completed" : true,
          "startTimeEpoch" : 1517295937433,
          "endTimeEpoch" : 1517295964625,
          "lastUpdatedEpoch" : 1517295964848
        } ]
      }
    • Analysis:

      With this command, you can query information about a Spark application. Table 1 provides information about the application.

  • Obtaining information about the executor of a running application:
    • Command of alive executors list:
      curl -k -i --negotiate -u: "https://192.168.169.84:8090/proxy/application_1478570725074_0046/api/v1/applications/application_1478570725074_0046/executors"
    • Command of all executors (alive and dead) list:
      curl -k -i --negotiate -u: "https://192.168.169.84:8090/proxy/application_1478570725074_0046/api/v1/applications/application_1478570725074_0046/allexecutors"

      192.168.169.84 indicates the service IP address of the master node of the ResourceManager, 8090 indicates the port number of the ResourceManager, and application_1478570725074_0046 indicates the application ID in YARN.

    • Command output:
      [{
        "id" : "driver",
        "hostPort" : "192.168.169.84:23886",
        "isActive" : true,
        "rddBlocks" : 0,
        "memoryUsed" : 0,
        "diskUsed" : 0,
        "activeTasks" : 0,
        "failedTasks" : 0,
        "completedTasks" : 0,
        "totalTasks" : 0,
        "totalDuration" : 0,
        "totalInputBytes" : 0,
        "totalShuffleRead" : 0,
        "totalShuffleWrite" : 0,
        "maxMemory" : 278019440,
        "executorLogs" : { }
      }, {
        "id" : "1",
        "hostPort" : "192.168.169.84:23902",
        "isActive" : true,
        "rddBlocks" : 0,
        "memoryUsed" : 0,
        "diskUsed" : 0,
        "totalCores" : 1,
        "maxTasks" : 1,
        "activeTasks" : 0,
        "failedTasks" : 0,
        "completedTasks" : 0,
        "totalTasks" : 0,
        "totalDuration" : 0,
        "totalGCTime" : 139,
        "totalInputBytes" : 0,
        "totalShuffleRead" : 0,
        "totalShuffleWrite" : 0,
        "maxMemory" : 555755765,
        "executorLogs" : {
          "stdout" : "https://XTJ-224:8044/node/containerlogs/container_1478570725074_0049_01_000002/admin/stdout?start=-4096",
          "stderr" : "https://XTJ-224:8044/node/containerlogs/container_1478570725074_0049_01_000002/admin/stderr?start=-4096"
        }
      } ]
    • Analysis:
      With this command, you can query information about all executors including drivers of the current application. Table 2 provides basic information about each executor.
      Table 2 Parameter description

      Parameter

      Description

      id

      Executor ID.

      hostPort

      IP address and port number of the node where the Executor resides. Format: IP address:port number.

      executorLogs

      Path to Executor logs.

Enhanced REST API

  • SQL related: obtaining all the SQL statements and those with the longest execution time.
    • SparkUI command:
      curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/SQL"

      192.168.195.232 indicates the service IP address of the master node of the ResourceManager, 8090 indicates the port number of the ResourceManager, and application_1476947670799_0053 indicates the application ID in YARN.

      NOTE:

      You can add parameters to the URL after the command to search for the corresponding SQL statements.

      For example, run the following command to view 100 SQL statements:

      curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/SQL?limit=100"

      Run the following command to view running parameters:

      curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/SQL?completed=false"
    • JobHistory command:
      curl -k -i --negotiate -u: "https://192.168.227.16:18080/api/v1/applications/application_1478570725074_0004/SQL"

      192.168.227.16 indicates the service IP address of the JobHistory node, 18080 indicates the port number of the JobHistory node, and application_1478570725074_0004 indicates the application ID.

    • Command output:

      The command output of the SparkUI and JobHistory commands is as follows:

      {
        "longestDurationOfCompletedSQL" : [ {
          "id" : 0,
          "status" : "COMPLETED",
          "description" : "getCallSite at SQLExecution.scala:48",
          "submissionTime" : "2016/11/08 15:39:00",
          "duration" : "2 s",
          "runningJobs" : [ ],
          "successedJobs" : [ 0 ],
          "failedJobs" : [ ]
        } ],
        "sqls" : [ {
          "id" : 0,
          "status" : "COMPLETED",
          "description" : "getCallSite at SQLExecution.scala:48",
          "submissionTime" : "2016/11/08 15:39:00",
          "duration" : "2 s",
          "runningJobs" : [ ],
          "successedJobs" : [ 0 ],
          "failedJobs" : [ ]
        }]
      }
    • Analysis:
      After running this command, you can obtain all the SQL statements executed by the current application (the sqls part of the command output) and the SQL statements with the longest execution time (the longestDurationOfCompletedSQL part of the command output). The information about each SQL statement is listed in Table 3.
      Table 3 Parameter description

      Parameter

      Description

      id

      SQL statement ID.

      status

      Execution status of the SQL statement, which can be: running, completed, and failed.

      runningJobs

      Jobs that are being executed generated by the SQL statement.

      successedJobs

      Jobs that are successfully executed generated by the SQL statement.

      failedJobs

      Job that fails to be executed generated by the SQL statement.

  • JDBC server related: obtaining the number of sessions, number of being-executed SQL statements, information about all sessions, and information about SQL statements.
    • Command:
      curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1476947670799_0053/api/v1/applications/application_1476947670799_0053/sqlserver"

      192.168.195.232 indicates the service IP address of the master node of the ResourceManager, 8090 indicates the port number of the ResourceManager, and application_1476947670799_0053 indicates the application ID in YARN.

    • Command output:
      {
        "sessionNum" : 1,
        "runningSqlNum" : 0,
        "sessions" : [ {
          "user" : "spark",
          "ip" : "192.168.169.84",
          "sessionId" : "9dfec575-48b4-4187-876a-71711d3d7a97",
          "startTime" : "2016/10/29 15:21:10",
          "finishTime" : "",
          "duration" : "1 minute 50 seconds",
          "totalExecute" : 1
        } ],
        "sqls" : [ {
          "user" : "spark",
          "jobId" : [ ],
          "groupId" : "e49ff81a-230f-4892-a209-a48abea2d969",
          "startTime" : "2016/10/29 15:21:13",
          "finishTime" : "2016/10/29 15:21:14",
          "duration" : "555 ms",
          "statement" : "show tables",
          "state" : "FINISHED",
          "detail" : "== Parsed Logical Plan ==\nShowTablesCommand None\n\n== Analyzed Logical Plan ==\ntableName: string, isTemporary: boolean\nShowTablesCommand None\n\n== Cached Logical Plan ==\nShowTablesCommand None\n\n== Optimized Logical Plan ==\nShowTablesCommand None\n\n== Physical Plan ==\nExecutedCommand ShowTablesCommand None\n\nCode Generation: true"
        } ]
      }
    • Analysis:
      After running this command, you can query the number of sessions in the current JDBC application, number of being-executed SQL statements, and information about all sessions and SQL statements. The information about each session is listed in Table 4, and the information about each SQL statement is listed in Table 5:
      Table 4 Session parameter description

      Parameter

      Description

      user

      User to whom the session connects.

      ip

      IP address of the node where the session resides.

      sessionId

      Session ID.

      startTime

      Time when the session starts the connection.

      finishTime

      Time when the session ends the connection.

      duration

      Connection duration of the session.

      totalExecute

      Number of SQL statements executed by the session.

      Table 5 SQL parameter description

      Parameter

      Description

      user

      User who executes the SQL statement.

      jobId

      IDs of jobs contained in the SQL statement.

      groupId

      ID of the group where the SQL statement resides.

      startTime

      Start time.

      finishTime

      End time.

      duration

      SQL statement execution duration.

      statement

      SQL statement.

      detail

      Logical/Physical plan.

  • JDBC API enhancement cancels the SQL statement that is being executed by using the execution ID obtained from the beeline.
    • Commands for the operation:
      curl -k -i --negotiate -X PUT -u: "https://192.168.195.232:8090/proxy/application_1477722033672_0008/api/v1/applications/application_1477722033672_0008/cancel/execution?executionId=8"
    • Command output:

      Cancel the job whose execution ID is 8.

    • Remarks:

      Run the SQL statement in spark-beeline. If the SQL statement generates a Spark task, the execution ID of the SQL statement will be printed in beeline. To cancel the execution of the SQL statement, run the preceding command.

  • Streaming related: obtaining the average input frequency, average scheduling delay, average execution duration, and average value of the overall delay.
    • Command:
      curl -k -i --negotiate -u: "https://192.168.195.232:8090/proxy/application_1477722033672_0008/api/v1/applications/application_1477722033672_0008/streaming/statistics"

      192.168.195.232 indicates the service IP address of the master node of the ResourceManager, 8090 indicates the port number of the ResourceManager, and application_1477722033672_0008 indicates the application ID in YARN.

    • Command:
      {
      "startTime" : "2018-12-25T08:58:10.836GMT",  
      "batchDuration" : 1000,  
      "numReceivers" : 1,  
      "numActiveReceivers" : 1,  
      "numInactiveReceivers" : 0,  
      "numTotalCompletedBatches" : 373,  
      "numRetainedCompletedBatches" : 373,  
      "numActiveBatches" : 0,  
      "numProcessedRecords" : 1,  
      "numReceivedRecords" : 1,  
      "avgInputRate" : 0.002680965147453083,  
      "avgSchedulingDelay" : 14,  
      "avgProcessingTime" : 47,  
      "avgTotalDelay" : 62
      }
    • Analysis:

      After running this command, you can query the average input frequency (unit: events/sec), average scheduling delay (unit: ms), average execution time (unit: ms), and average value of the total delay (unit: ms) of the current Streaming application.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback