Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-45635 FlinkServer Job Execution Failure
Updated on 2024-11-29 GMT+08:00

ALM-45635 FlinkServer Job Execution Failure

Alarm Description

The system checks whether FlinkServer jobs fail to be executed every 10 seconds. This alarm is generated when a FlinkServer job fails. This alarm is cleared when the job is successfully restarted.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

45635

Major

Quality of service

Flink

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster for which the alarm was generated.

ServiceName

Specifies the service for which the alarm was generated.

ApplicationName

Specifies the name of the application for which the alarm was generated.

JobName

Specifies the job for which the alarm was generated.

UserName

Specifies the username for which the alarm was generated.

Impact on the System

This alarm is a job-level alarm and does not affect FlinkServer. You need to view Flink job logs to find out the failure cause.

Possible Causes

You can view failure causes in specific logs.

Handling Procedure

  1. Log in to Manager as a user who has the FlinkServer management permission.
  2. Choose Cluster > Services > Yarn and click the link next to ResourceManager WebUI to go to the native Yarn page.
  3. Locate the failed task based on its name displayed in Location, search for and record the application ID of the job, and check whether the job logs are available on the native Yarn page.

    Figure 1 Application ID of a job
    • If yes, go to 4.
    • If no, go to 6.

  4. Click the application ID of the failed job to go to the job page.

    1. Click Logs in the Logs column to view JobManager logs.
      Figure 2 Clicking Logs
    2. Click the ID in the Attempt ID column and click Logs in the Logs column to view TaskManager logs.
      Figure 3 Clicking the ID in the Attempt ID column
      Figure 4 Clicking Logs

      You can also log in to Manager as a user who has the FlinkServer management permission. Choose Cluster > Services > Flink, and click the link next to Flink WebUI. On the displayed Flink web UI, click Job Management, click More in the Operation column, and select Job Monitoring to view TaskManager logs.

  5. View the logs of the failed job to rectify the fault, or contact the O&M engineers and send the collected fault logs. No further action is required.

If logs are unavailable on the Yarn page, download logs from HDFS.

  1. On Manager, choose Cluster > Services > HDFS, click the link next to NameNode WebUI to go to the HDFS page, choose Utilities > Browse the file system, and download logs in the /tmp/logs/Username/logs/Application ID of the failed job directory.
  2. View the logs of the failed job to rectify the fault, or contact the O&M engineers and send the collected fault logs.

Alarm Clearance

After the job is successfully restarted, the alarm is cleared if it has been reported.

Related Information

None.